Introduction¶

Chronic Kidney disease is the gradual loss of function of the kidney with no symptoms being manifested. 1 It's difficult to know the burden of the disease since they are no accurate diagnostic tests according to research done here. It could be characterized by uremic frost; however, careful diagnosis of the condition should be followed such as testing kidney function URI scan dripstick test for example the specific gravity -- low values(1.01 - 1.010) could mean that the patient has kidney damage, observation of the urine using microscopy and identification of casts) and other tests can help make a proper diagnosis.

In this notebook, we'll use data with 25 features that could be indicative of chronic kidney disease to see if predictive modelling could help us figure out which patients have chronic kidney disease. You can read more about the dataset using this link. Let's proceed to exploratory data analysis.

I first import all the packages that could be useful in wrangling, visualization and statistical modelling. I apologise if there's a package here that I have imported but I haven't used it. It may have slipped my mind for some reason.

In [1]:
import numpy as np # numeric processing
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.feature_extraction import DictVectorizer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from IPython.display import HTML
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score, classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from functools import *
import ydata_profiling
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline

Turicreate is a machine learning library by apple. There's some functionality that I was interested in that I wanted to try. In future, I may add it to the notebook. If you want more information about the library. Find for information here.

In [2]:
#!pip install turicreate -q

Loading data and Exploratory data analysis¶

In this analysis, we'll do predictive modelling in hopes of finding a model which will be able to classify the patients appropriately.

Download the dataset from here and load it into the notebook.

In [3]:
!chmod a+x get_data.sh
!./get_data.sh
Downloading dataset from kaggle
This may take a few minutes...
Link: https://www.kaggle.com/mansoordaku/ckdisease
Attempt 1 of 5...
Warning: Looks like you're using an outdated API Version, please consider updating (server 1.7.4.2 / client 1.6.17)
Dataset URL: https://www.kaggle.com/datasets/mansoordaku/ckdisease
License(s): unknown
Downloading ckdisease.zip to /home/stormbird/Desktop/chronic-kidney-disease-kaggle
  0%|                                               | 0.00/9.51k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 9.51k/9.51k [00:00<00:00, 10.8MB/s]
Dataset downloaded successfully.
Archive:  ckdisease.zip
  inflating: kidney_disease.csv      
Dataset downloaded successfully and moved to data/input folder
kidney_disease.csv
In [4]:
# load the dataset with pandas read_csv function
df = pd.read_csv('data/input/kidney_disease.csv', index_col="id")

# give the dtypes of the columns if the data was squeeky clean
dtypes = {
    'id' : np.int32,
    'age' : np.int32,
    'bp' : np.float32,
    'sg' : object, # category
    'al' : object, # category # mistake
    'su' : object, #category  # mistake
    'rbc' : object, # category
    'pc' : object, # category
    'pcc' : object, # category
    'ba' : object, # category
    'bgr' : np.float32,
    'bu' : np.int32,
    'sc' : np.float32,
    'sod': np.int32,
    'pot' : np.float32,
    'hemo' : np.float32,
    'pcv' : np.int32,
    'wc' : np.int32,
    'rc' : np.int32,
    'htn' : object,
    'dm' : object,
    'cad' : object,
    'appet': object,
    'pe' : object,
    'ane' : object,
    'class': object}

# another way of reading in the datasets especially very big files like 1GB big
# df2 = dd.read_csv('../input/kidney_disease.csv', dtype=dtypes)
# id                400 non-null int64
# age               391 non-null float64
# bp                388 non-null float64
# sg                353 non-null float64
# al                354 non-null float64
# su                351 non-null float64
# rbc               248 non-null object
# pc                335 non-null object
# pcc               396 non-null object
# ba                396 non-null object
# bgr               356 non-null float64
# bu                381 non-null float64
# sc                383 non-null float64
# sod               313 non-null float64
# pot               312 non-null float64
# hemo              348 non-null float64
# pcv               330 non-null object
# wc                295 non-null object
# rc                270 non-null object
# htn               398 non-null object
# dm                398 non-null object
# cad               398 non-null object
# appet             399 non-null object
# pe                399 non-null object
# ane               399 non-null object
# classification    400 non-null object
In [5]:
# see the first couple of observations and transpose 10 observations
# think of it as rolling over your dataset
df.head(10).transpose()
Out[5]:
id 0 1 2 3 4 5 6 7 8 9
age 48.0 7.0 62.0 48.0 51.0 60.0 68.0 24.0 52.0 53.0
bp 80.0 50.0 80.0 70.0 80.0 90.0 70.0 NaN 100.0 90.0
sg 1.02 1.02 1.01 1.005 1.01 1.015 1.01 1.015 1.015 1.02
al 1.0 4.0 2.0 4.0 2.0 3.0 0.0 2.0 3.0 2.0
su 0.0 0.0 3.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0
rbc NaN NaN normal normal normal NaN NaN normal normal abnormal
pc normal normal normal abnormal normal NaN normal abnormal abnormal abnormal
pcc notpresent notpresent notpresent present notpresent notpresent notpresent notpresent present present
ba notpresent notpresent notpresent notpresent notpresent notpresent notpresent notpresent notpresent notpresent
bgr 121.0 NaN 423.0 117.0 106.0 74.0 100.0 410.0 138.0 70.0
bu 36.0 18.0 53.0 56.0 26.0 25.0 54.0 31.0 60.0 107.0
sc 1.2 0.8 1.8 3.8 1.4 1.1 24.0 1.1 1.9 7.2
sod NaN NaN NaN 111.0 NaN 142.0 104.0 NaN NaN 114.0
pot NaN NaN NaN 2.5 NaN 3.2 4.0 NaN NaN 3.7
hemo 15.4 11.3 9.6 11.2 11.6 12.2 12.4 12.4 10.8 9.5
pcv 44 38 31 32 35 39 36 44 33 29
wc 7800 6000 7500 6700 7300 7800 NaN 6900 9600 12100
rc 5.2 NaN NaN 3.9 4.6 4.4 NaN 5 4.0 3.7
htn yes no no yes no yes no no yes yes
dm yes no yes no no yes no yes yes yes
cad no no no no no no no no no no
appet good good poor poor good good good good good poor
pe no no no yes no yes no yes no no
ane no no yes yes no no no no yes yes
classification ckd ckd ckd ckd ckd ckd ckd ckd ckd ckd
In [6]:
# see the column names
df.columns
Out[6]:
Index(['age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu',
       'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
       'appet', 'pe', 'ane', 'classification'],
      dtype='object')
In [7]:
# see a concise summary of the dataset
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 400 entries, 0 to 399
Data columns (total 25 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             391 non-null    float64
 1   bp              388 non-null    float64
 2   sg              353 non-null    float64
 3   al              354 non-null    float64
 4   su              351 non-null    float64
 5   rbc             248 non-null    object 
 6   pc              335 non-null    object 
 7   pcc             396 non-null    object 
 8   ba              396 non-null    object 
 9   bgr             356 non-null    float64
 10  bu              381 non-null    float64
 11  sc              383 non-null    float64
 12  sod             313 non-null    float64
 13  pot             312 non-null    float64
 14  hemo            348 non-null    float64
 15  pcv             330 non-null    object 
 16  wc              295 non-null    object 
 17  rc              270 non-null    object 
 18  htn             398 non-null    object 
 19  dm              398 non-null    object 
 20  cad             398 non-null    object 
 21  appet           399 non-null    object 
 22  pe              399 non-null    object 
 23  ane             399 non-null    object 
 24  classification  400 non-null    object 
dtypes: float64(11), object(14)
memory usage: 81.2+ KB
  • 26 columns and a variable number of observations per feature/variable

  • 400 rows for each id - there could be missing data among the rows of the variable

In [8]:
# display summary statistics of each column
# this helps me confirm my assertion on missing data
df.describe(include="all").transpose()
Out[8]:
count unique top freq mean std min 25% 50% 75% max
age 391.0 NaN NaN NaN 51.483376 17.169714 2.0 42.0 55.0 64.5 90.0
bp 388.0 NaN NaN NaN 76.469072 13.683637 50.0 70.0 80.0 80.0 180.0
sg 353.0 NaN NaN NaN 1.017408 0.005717 1.005 1.01 1.02 1.02 1.025
al 354.0 NaN NaN NaN 1.016949 1.352679 0.0 0.0 0.0 2.0 5.0
su 351.0 NaN NaN NaN 0.450142 1.099191 0.0 0.0 0.0 0.0 5.0
rbc 248 2 normal 201 NaN NaN NaN NaN NaN NaN NaN
pc 335 2 normal 259 NaN NaN NaN NaN NaN NaN NaN
pcc 396 2 notpresent 354 NaN NaN NaN NaN NaN NaN NaN
ba 396 2 notpresent 374 NaN NaN NaN NaN NaN NaN NaN
bgr 356.0 NaN NaN NaN 148.036517 79.281714 22.0 99.0 121.0 163.0 490.0
bu 381.0 NaN NaN NaN 57.425722 50.503006 1.5 27.0 42.0 66.0 391.0
sc 383.0 NaN NaN NaN 3.072454 5.741126 0.4 0.9 1.3 2.8 76.0
sod 313.0 NaN NaN NaN 137.528754 10.408752 4.5 135.0 138.0 142.0 163.0
pot 312.0 NaN NaN NaN 4.627244 3.193904 2.5 3.8 4.4 4.9 47.0
hemo 348.0 NaN NaN NaN 12.526437 2.912587 3.1 10.3 12.65 15.0 17.8
pcv 330 44 41 21 NaN NaN NaN NaN NaN NaN NaN
wc 295 92 9800 11 NaN NaN NaN NaN NaN NaN NaN
rc 270 49 5.2 18 NaN NaN NaN NaN NaN NaN NaN
htn 398 2 no 251 NaN NaN NaN NaN NaN NaN NaN
dm 398 5 no 258 NaN NaN NaN NaN NaN NaN NaN
cad 398 3 no 362 NaN NaN NaN NaN NaN NaN NaN
appet 399 2 good 317 NaN NaN NaN NaN NaN NaN NaN
pe 399 2 no 323 NaN NaN NaN NaN NaN NaN NaN
ane 399 2 no 339 NaN NaN NaN NaN NaN NaN NaN
classification 400 3 ckd 248 NaN NaN NaN NaN NaN NaN NaN
In [9]:
# Looking at variables interractively

profile = ydata_profiling.ProfileReport(df)

profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[9]:

The good news is that we can work with the current state of the columns since they have been labelled consistently. Bad news is that we have a lot of missing data in this dataset. Let's proceed and find out the number of missing values per column and if the classes are balanced or unbalanced. The profiler did the work already but sometimes it is good to confirm it your own way.

In [10]:
# looking for the number of missing observations
# In the code below a boolean is being tried on each observation asking if the observation is missing or not
# then add all instances of NaN(Not a number)
missing_values = df.isnull().sum()

# calculating the percentage of missing values in the dataframe
# simply taking the sum of the values we got above dividing by the no of observations in the df
# you could use len(df) instead df.index.size
missing_count_pct = ((missing_values / df.index.size) * 100)

# see how many observations are missing
print(missing_count_pct)
age                2.25
bp                 3.00
sg                11.75
al                11.50
su                12.25
rbc               38.00
pc                16.25
pcc                1.00
ba                 1.00
bgr               11.00
bu                 4.75
sc                 4.25
sod               21.75
pot               22.00
hemo              13.00
pcv               17.50
wc                26.25
rc                32.50
htn                0.50
dm                 0.50
cad                0.50
appet              0.25
pe                 0.25
ane                0.25
classification     0.00
dtype: float64
In [11]:
# take the missing count percentage and use boolean mask to filter out columns
# whose observation threshold is greater than 25 percent
# 25 is a random number chosen based how much missing data is in the dataset
# 25-50 is normally a red flag since most data is missing
columns_to_drop = missing_count_pct[missing_count_pct > 25].index

# remove columns that meet that threshold and save result in column df_dropped
df_dropped = df.drop(columns_to_drop, axis=1)
In [12]:
# number of columns remaining after filtering
df.columns.size - df_dropped.columns.size

# only three columns are lost
Out[12]:
3

I really hate losing a few columns. I won't throw everything away. But, I will keep these columns while we are doing predictive modelling use the different variants of the datasets and see if there will be any boost in results. In the meantime, let's look at the code book to come up with a hypothesis to find out which columns are the most important and converting the types of each column to another format that will speed up computation during training.

In [13]:
# look at the code book on kaggle and write which columns could be useful here

According to the original site where we found data here.. I found the identity of the columns rather what the columns mean. I'll put a star on the columns i think are important from my background in medical laboratory science. Then the second run through this notebook we could explore only the columns i think are important and lastly use a technique called singular value decomposition to figure out which ones are the most important.

age - age

bp - blood pressure *

sg - specific gravity *

al - albumin *

su - sugar *

rbc - red blood cells *

pc - pus cell*

pcc - pus cell clumps *

ba - bacteria*

bgr - blood glucose random

bu - blood urea*

sc - serum creatinine

sod - sodium

pot - potassium

hemo - hemoglobin*

pcv - packed cell volume

wc - white blood cell count*

rc - red blood cell count*

htn - hypertension*

dm - diabetes mellitus*

cad - coronary artery disease*

appet - appetite*

pe - pedal edema*

ane - anemia*

class - class*

In [14]:
# checking the types of the column to figure out the best next steps of conversion of data types
df.dtypes
Out[14]:
age               float64
bp                float64
sg                float64
al                float64
su                float64
rbc                object
pc                 object
pcc                object
ba                 object
bgr               float64
bu                float64
sc                float64
sod               float64
pot               float64
hemo              float64
pcv                object
wc                 object
rc                 object
htn                object
dm                 object
cad                object
appet              object
pe                 object
ane                object
classification     object
dtype: object

Review the columns from the original codebook to determine the datatypes then make a schema which we can follow as i import the dataset

In [15]:
# fix the columns to be of the categorical type
# if the value is missing replace the NA with the word miss
constant_imputer = SimpleImputer(strategy="constant", fill_value = "miss")

# apply it to categorical columns
df[["rbc"]] = constant_imputer.fit_transform(df[["rbc"]])
df[["pcc"]] = constant_imputer.fit_transform(df[["pcc"]])

# converting the types to be categorical
# Go ahead and use a function here
df['rbc'] = df['rbc'].astype("category")
df['pc'] = df['pc'].astype("category")
df["pcc"] = df['pcc'].astype("category")
df['ba'] = df['ba'].astype("category")
df['appet'] = df['appet'].astype("category")
df['pe'] = df['pe'].astype("category")
df['ane'] = df['ane'].astype("category")
df['classification'] = df['classification'].astype("category")
df['htn'] = df['htn'].astype("category")
df['dm'] = df['dm'].astype("category")
df['cad'] = df['cad'].astype("category")


# confirm the dtypes now
df.dtypes
Out[15]:
age                float64
bp                 float64
sg                 float64
al                 float64
su                 float64
rbc               category
pc                category
pcc               category
ba                category
bgr                float64
bu                 float64
sc                 float64
sod                float64
pot                float64
hemo               float64
pcv                 object
wc                  object
rc                  object
htn               category
dm                category
cad               category
appet             category
pe                category
ane               category
classification    category
dtype: object
In [16]:
# seeing the columns in list form thinking mode
df.columns
Out[16]:
Index(['age', 'bp', 'sg', 'al', 'su', 'rbc', 'pc', 'pcc', 'ba', 'bgr', 'bu',
       'sc', 'sod', 'pot', 'hemo', 'pcv', 'wc', 'rc', 'htn', 'dm', 'cad',
       'appet', 'pe', 'ane', 'classification'],
      dtype='object')
In [17]:
# make a copy of the whole dataset
df_copy = df.copy()

# remove the target column for the other uses in the next steps
#df = df.drop("classification", axis = 1)
In [18]:
# using a boolean to figure out which columns are of type object and numeric to do other preprocessing
# in the workflow
object_columns = df.dtypes == "object"
numeric_columns = df.dtypes == "float64"
category_columns = df.dtypes == "category"
In [19]:
# use regular expressions to fix it and this is supposed to be one of the first steps after df.dtypes command. If it's categorical
# you can replace it with anything you want here I use -999 to replace the data entries the tab character to flag them as outliers.
# I change the dtypes so to 32 bit to save memory

df['pcv'] = df['pcv'].replace("\t?",-999).fillna(0).astype("int32") # use  str.replace on column to something meaningful
df['wc'] = df['wc'].replace("\t?", -999).fillna(0).astype("int32") # use  str.replace on column to something meaningful
df['rc'] = df['rc'].replace("\t?", -999).fillna(0).astype("float32") # use  str.replace on column to something meaningful

# exploring another imputation strategy that uses the median
# mean_imputer = SimpleImputer(strategy="median")
# df["pcv"] = mean_imputer.fit_transform(df["pcv"])
# df["wc"] = mean_imputer.fit_transform(df["wc"])
# df["rc"] = mean_imputer.fit_transform(df["rc"])
In [20]:
# write code to extract columns of the type object and numeric
# Make a boolean mask for categorical columns
cat_mask_obj = (df.dtypes == "object") | (df.dtypes == "category")

# Get list of categorical column names
cat_mask_object = df.columns[cat_mask_obj].tolist()

# now for numerical columns
# anything that was parsed as float64 is numeric: make a boolean mask for that
cat_mask_numeric = (df.dtypes == "float64")
cat_mask_numeric = df.columns[cat_mask_numeric].tolist()

# see the result in a combined list: to the left categorical and the right we have numeric columns
print(cat_mask_object, "\n", cat_mask_numeric)
['rbc', 'pc', 'pcc', 'ba', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane', 'classification'] 
 ['age', 'bp', 'sg', 'al', 'su', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo']
In [21]:
# convert all instances of the float 64 to float 32 to speed up computation in the subsequent steps
# remove all the missing values and make sure that they are all numeric
numeric_columns_float32 = df[cat_mask_numeric].astype("float32").fillna(0)
In [22]:
#it's worked
numeric_columns_float32.dtypes
Out[22]:
age     float32
bp      float32
sg      float32
al      float32
su      float32
bgr     float32
bu      float32
sc      float32
sod     float32
pot     float32
hemo    float32
dtype: object
In [23]:
# Task: split the category columns and object columns to the right type
# you can import the dataset to have the right types too upon import
# you can do this the next time you have time to continue working on this add it as comment though

They are some columns that are wrongly parsed due to the NAs. They include: pcv(numerical int32), rc(numerical int32). I can either interpolate the missing columns depending how they'll look like in a plot or use mean/median to find the value.

In [24]:
# it makes sense that they are some individuals who were not sampled therefore filling the whole dataset with NAs makes sense
# these two columns have data entry problems
# use regular expressions to fix it and this is supposed to be one of the first steps after df.dtypes command. If it's categorical
# you can replace it with anything you want
#df['pcv'] = df['pcv'].fillna(0, inplace = True)
#df['rc'] = df['rc'].fillna(0, inplace = True)
#df['wc'] = df['wc'].fillna(0, inplace = True)
In [25]:
# finding the number of null or NA values in the columns
pd.isnull(df).sum()
Out[25]:
age                9
bp                12
sg                47
al                46
su                49
rbc                0
pc                65
pcc                0
ba                 4
bgr               44
bu                19
sc                17
sod               87
pot               88
hemo              52
pcv                0
wc                 0
rc                 0
htn                2
dm                 2
cad                2
appet              1
pe                 1
ane                1
classification     0
dtype: int64
In [26]:
# checking the dtypes once more
df.dtypes
Out[26]:
age                float64
bp                 float64
sg                 float64
al                 float64
su                 float64
rbc               category
pc                category
pcc               category
ba                category
bgr                float64
bu                 float64
sc                 float64
sod                float64
pot                float64
hemo               float64
pcv                  int32
wc                   int32
rc                 float32
htn               category
dm                category
cad               category
appet             category
pe                category
ane               category
classification    category
dtype: object
In [27]:
# concatentate the numeric columns with the category columns to build the full dataset and then X and Y
# remove
df[cat_mask_object] = constant_imputer.fit_transform(df[cat_mask_object])
In [28]:
# check for missing values
print(df[cat_mask_object].isnull().sum())
print("*" * 100)
print(numeric_columns_float32.isnull().sum())
rbc               0
pc                0
pcc               0
ba                0
htn               0
dm                0
cad               0
appet             0
pe                0
ane               0
classification    0
dtype: int64
****************************************************************************************************
age     0
bp      0
sg      0
al      0
su      0
bgr     0
bu      0
sc      0
sod     0
pot     0
hemo    0
dtype: int64
In [29]:
# bring the columns together with pd.concat
df_clean = pd.concat([numeric_columns_float32, df[cat_mask_object]], axis = 1)

# check the shape of the columns
df_clean.shape
Out[29]:
(400, 22)
In [30]:
# just see the first 10 observations
df_clean.head(10)
# HTML(df_clean.to_html()) see the whole dataframe in HTML format
Out[30]:
age bp sg al su bgr bu sc sod pot ... pc pcc ba htn dm cad appet pe ane classification
id
0 48.0 80.0 1.020 1.0 0.0 121.0 36.0 1.2 0.0 0.0 ... normal notpresent notpresent yes yes no good no no ckd
1 7.0 50.0 1.020 4.0 0.0 0.0 18.0 0.8 0.0 0.0 ... normal notpresent notpresent no no no good no no ckd
2 62.0 80.0 1.010 2.0 3.0 423.0 53.0 1.8 0.0 0.0 ... normal notpresent notpresent no yes no poor no yes ckd
3 48.0 70.0 1.005 4.0 0.0 117.0 56.0 3.8 111.0 2.5 ... abnormal present notpresent yes no no poor yes yes ckd
4 51.0 80.0 1.010 2.0 0.0 106.0 26.0 1.4 0.0 0.0 ... normal notpresent notpresent no no no good no no ckd
5 60.0 90.0 1.015 3.0 0.0 74.0 25.0 1.1 142.0 3.2 ... miss notpresent notpresent yes yes no good yes no ckd
6 68.0 70.0 1.010 0.0 0.0 100.0 54.0 24.0 104.0 4.0 ... normal notpresent notpresent no no no good no no ckd
7 24.0 0.0 1.015 2.0 4.0 410.0 31.0 1.1 0.0 0.0 ... abnormal notpresent notpresent no yes no good yes no ckd
8 52.0 100.0 1.015 3.0 0.0 138.0 60.0 1.9 0.0 0.0 ... abnormal present notpresent yes yes no good no yes ckd
9 53.0 90.0 1.020 2.0 0.0 70.0 107.0 7.2 114.0 3.7 ... abnormal present notpresent yes yes no poor no yes ckd

10 rows × 22 columns

In [31]:
# now see the bottom 10
df_clean.tail(10)
Out[31]:
age bp sg al su bgr bu sc sod pot ... pc pcc ba htn dm cad appet pe ane classification
id
390 52.0 80.0 1.025 0.0 0.0 99.0 25.0 0.8 135.0 3.7 ... normal notpresent notpresent no no no good no no notckd
391 36.0 80.0 1.025 0.0 0.0 85.0 16.0 1.1 142.0 4.1 ... normal notpresent notpresent no no no good no no notckd
392 57.0 80.0 1.020 0.0 0.0 133.0 48.0 1.2 147.0 4.3 ... normal notpresent notpresent no no no good no no notckd
393 43.0 60.0 1.025 0.0 0.0 117.0 45.0 0.7 141.0 4.4 ... normal notpresent notpresent no no no good no no notckd
394 50.0 80.0 1.020 0.0 0.0 137.0 46.0 0.8 139.0 5.0 ... normal notpresent notpresent no no no good no no notckd
395 55.0 80.0 1.020 0.0 0.0 140.0 49.0 0.5 150.0 4.9 ... normal notpresent notpresent no no no good no no notckd
396 42.0 70.0 1.025 0.0 0.0 75.0 31.0 1.2 141.0 3.5 ... normal notpresent notpresent no no no good no no notckd
397 12.0 80.0 1.020 0.0 0.0 100.0 26.0 0.6 137.0 4.4 ... normal notpresent notpresent no no no good no no notckd
398 17.0 60.0 1.025 0.0 0.0 114.0 50.0 1.0 135.0 4.9 ... normal notpresent notpresent no no no good no no notckd
399 58.0 80.0 1.025 0.0 0.0 131.0 18.0 1.1 141.0 3.5 ... normal notpresent notpresent no no no good no no notckd

10 rows × 22 columns

In [32]:
HTML(df.to_html()) # just looking for something I may have missed in the pandas profiling
Out[32]:
age bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wc rc htn dm cad appet pe ane classification
id
0 48.0 80.0 1.020 1.0 0.0 miss normal notpresent notpresent 121.0 36.0 1.20 NaN NaN 15.4 44 7800 5.2 yes yes no good no no ckd
1 7.0 50.0 1.020 4.0 0.0 miss normal notpresent notpresent NaN 18.0 0.80 NaN NaN 11.3 38 6000 0.0 no no no good no no ckd
2 62.0 80.0 1.010 2.0 3.0 normal normal notpresent notpresent 423.0 53.0 1.80 NaN NaN 9.6 31 7500 0.0 no yes no poor no yes ckd
3 48.0 70.0 1.005 4.0 0.0 normal abnormal present notpresent 117.0 56.0 3.80 111.0 2.5 11.2 32 6700 3.9 yes no no poor yes yes ckd
4 51.0 80.0 1.010 2.0 0.0 normal normal notpresent notpresent 106.0 26.0 1.40 NaN NaN 11.6 35 7300 4.6 no no no good no no ckd
5 60.0 90.0 1.015 3.0 0.0 miss miss notpresent notpresent 74.0 25.0 1.10 142.0 3.2 12.2 39 7800 4.4 yes yes no good yes no ckd
6 68.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 100.0 54.0 24.00 104.0 4.0 12.4 36 0 0.0 no no no good no no ckd
7 24.0 NaN 1.015 2.0 4.0 normal abnormal notpresent notpresent 410.0 31.0 1.10 NaN NaN 12.4 44 6900 5.0 no yes no good yes no ckd
8 52.0 100.0 1.015 3.0 0.0 normal abnormal present notpresent 138.0 60.0 1.90 NaN NaN 10.8 33 9600 4.0 yes yes no good no yes ckd
9 53.0 90.0 1.020 2.0 0.0 abnormal abnormal present notpresent 70.0 107.0 7.20 114.0 3.7 9.5 29 12100 3.7 yes yes no poor no yes ckd
10 50.0 60.0 1.010 2.0 4.0 miss abnormal present notpresent 490.0 55.0 4.00 NaN NaN 9.4 28 0 0.0 yes yes no good no yes ckd
11 63.0 70.0 1.010 3.0 0.0 abnormal abnormal present notpresent 380.0 60.0 2.70 131.0 4.2 10.8 32 4500 3.8 yes yes no poor yes no ckd
12 68.0 70.0 1.015 3.0 1.0 miss normal present notpresent 208.0 72.0 2.10 138.0 5.8 9.7 28 12200 3.4 yes yes yes poor yes no ckd
13 68.0 70.0 NaN NaN NaN miss miss notpresent notpresent 98.0 86.0 4.60 135.0 3.4 9.8 0 0 0.0 yes yes yes poor yes no ckd
14 68.0 80.0 1.010 3.0 2.0 normal abnormal present present 157.0 90.0 4.10 130.0 6.4 5.6 16 11000 2.6 yes yes yes poor yes no ckd
15 40.0 80.0 1.015 3.0 0.0 miss normal notpresent notpresent 76.0 162.0 9.60 141.0 4.9 7.6 24 3800 2.8 yes no no good no yes ckd
16 47.0 70.0 1.015 2.0 0.0 miss normal notpresent notpresent 99.0 46.0 2.20 138.0 4.1 12.6 0 0 0.0 no no no good no no ckd
17 47.0 80.0 NaN NaN NaN miss miss notpresent notpresent 114.0 87.0 5.20 139.0 3.7 12.1 0 0 0.0 yes no no poor no no ckd
18 60.0 100.0 1.025 0.0 3.0 miss normal notpresent notpresent 263.0 27.0 1.30 135.0 4.3 12.7 37 11400 4.3 yes yes yes good no no ckd
19 62.0 60.0 1.015 1.0 0.0 miss abnormal present notpresent 100.0 31.0 1.60 NaN NaN 10.3 30 5300 3.7 yes no yes good no no ckd
20 61.0 80.0 1.015 2.0 0.0 abnormal abnormal notpresent notpresent 173.0 148.0 3.90 135.0 5.2 7.7 24 9200 3.2 yes yes yes poor yes yes ckd
21 60.0 90.0 NaN NaN NaN miss miss notpresent notpresent NaN 180.0 76.00 4.5 NaN 10.9 32 6200 3.6 yes yes yes good no no ckd
22 48.0 80.0 1.025 4.0 0.0 normal abnormal notpresent notpresent 95.0 163.0 7.70 136.0 3.8 9.8 32 6900 3.4 yes no no good no yes ckd
23 21.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent NaN NaN NaN NaN NaN NaN 0 0 0.0 no no no poor no yes ckd
24 42.0 100.0 1.015 4.0 0.0 normal abnormal notpresent present NaN 50.0 1.40 129.0 4.0 11.1 39 8300 4.6 yes no no poor no no ckd
25 61.0 60.0 1.025 0.0 0.0 miss normal notpresent notpresent 108.0 75.0 1.90 141.0 5.2 9.9 29 8400 3.7 yes yes no good no yes ckd
26 75.0 80.0 1.015 0.0 0.0 miss normal notpresent notpresent 156.0 45.0 2.40 140.0 3.4 11.6 35 10300 4.0 yes yes no poor no no ckd
27 69.0 70.0 1.010 3.0 4.0 normal abnormal notpresent notpresent 264.0 87.0 2.70 130.0 4.0 12.5 37 9600 4.1 yes yes yes good yes no ckd
28 75.0 70.0 NaN 1.0 3.0 miss miss notpresent notpresent 123.0 31.0 1.40 NaN NaN NaN 0 0 0.0 no yes no good no no ckd
29 68.0 70.0 1.005 1.0 0.0 abnormal abnormal present notpresent NaN 28.0 1.40 NaN NaN 12.9 38 0 0.0 no no yes good no no ckd
30 NaN 70.0 NaN NaN NaN miss miss notpresent notpresent 93.0 155.0 7.30 132.0 4.9 NaN 0 0 0.0 yes yes no good no no ckd
31 73.0 90.0 1.015 3.0 0.0 miss abnormal present notpresent 107.0 33.0 1.50 141.0 4.6 10.1 30 7800 4.0 no no no poor no no ckd
32 61.0 90.0 1.010 1.0 1.0 miss normal notpresent notpresent 159.0 39.0 1.50 133.0 4.9 11.3 34 9600 4.0 yes yes no poor no no ckd
33 60.0 100.0 1.020 2.0 0.0 abnormal abnormal notpresent notpresent 140.0 55.0 2.50 NaN NaN 10.1 29 0 0.0 yes no no poor no no ckd
34 70.0 70.0 1.010 1.0 0.0 normal miss present present 171.0 153.0 5.20 NaN NaN NaN 0 0 0.0 no yes no poor no no ckd
35 65.0 90.0 1.020 2.0 1.0 abnormal normal notpresent notpresent 270.0 39.0 2.00 NaN NaN 12.0 36 9800 4.9 yes yes no poor no yes ckd
36 76.0 70.0 1.015 1.0 0.0 normal normal notpresent notpresent 92.0 29.0 1.80 133.0 3.9 10.3 32 0 0.0 yes no no good no no ckd
37 72.0 80.0 NaN NaN NaN miss miss notpresent notpresent 137.0 65.0 3.40 141.0 4.7 9.7 28 6900 2.5 yes yes no poor no yes ckd\t
38 69.0 80.0 1.020 3.0 0.0 abnormal normal notpresent notpresent NaN 103.0 4.10 132.0 5.9 12.5 0 0 0.0 yes no no good no no ckd
39 82.0 80.0 1.010 2.0 2.0 normal miss notpresent notpresent 140.0 70.0 3.40 136.0 4.2 13.0 40 9800 4.2 yes yes no good no no ckd
40 46.0 90.0 1.010 2.0 0.0 normal abnormal notpresent notpresent 99.0 80.0 2.10 NaN NaN 11.1 32 9100 4.1 yes no \tno good no no ckd
41 45.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent NaN 20.0 0.70 NaN NaN NaN 0 0 0.0 no no no good yes no ckd
42 47.0 100.0 1.010 0.0 0.0 miss normal notpresent notpresent 204.0 29.0 1.00 139.0 4.2 9.7 33 9200 4.5 yes no no good no yes ckd
43 35.0 80.0 1.010 1.0 0.0 abnormal miss notpresent notpresent 79.0 202.0 10.80 134.0 3.4 7.9 24 7900 3.1 no yes no good no no ckd
44 54.0 80.0 1.010 3.0 0.0 abnormal abnormal notpresent notpresent 207.0 77.0 6.30 134.0 4.8 9.7 28 0 0.0 yes yes no poor yes no ckd
45 54.0 80.0 1.020 3.0 0.0 miss abnormal notpresent notpresent 208.0 89.0 5.90 130.0 4.9 9.3 0 0 0.0 yes yes no poor yes no ckd
46 48.0 70.0 1.015 0.0 0.0 miss normal notpresent notpresent 124.0 24.0 1.20 142.0 4.2 12.4 37 6400 4.7 no yes no good no no ckd
47 11.0 80.0 1.010 3.0 0.0 miss normal notpresent notpresent NaN 17.0 0.80 NaN NaN 15.0 45 8600 0.0 no no no good no no ckd
48 73.0 70.0 1.005 0.0 0.0 normal normal notpresent notpresent 70.0 32.0 0.90 125.0 4.0 10.0 29 18900 3.5 yes yes no good yes no ckd
49 60.0 70.0 1.010 2.0 0.0 normal abnormal present notpresent 144.0 72.0 3.00 NaN NaN 9.7 29 21600 3.5 yes yes no poor no yes ckd
50 53.0 60.0 NaN NaN NaN miss miss notpresent notpresent 91.0 114.0 3.25 142.0 4.3 8.6 28 11000 3.8 yes yes no poor yes yes ckd
51 54.0 100.0 1.015 3.0 0.0 miss normal present notpresent 162.0 66.0 1.60 136.0 4.4 10.3 33 0 0.0 yes yes no poor yes no ckd
52 53.0 90.0 1.015 0.0 0.0 miss normal notpresent notpresent NaN 38.0 2.20 NaN NaN 10.9 34 4300 3.7 no no no poor no yes ckd
53 62.0 80.0 1.015 0.0 5.0 miss miss notpresent notpresent 246.0 24.0 1.00 NaN NaN 13.6 40 8500 4.7 yes yes no good no no ckd
54 63.0 80.0 1.010 2.0 2.0 normal miss notpresent notpresent NaN NaN 3.40 136.0 4.2 13.0 40 9800 4.2 yes no yes good no no ckd
55 35.0 80.0 1.005 3.0 0.0 abnormal normal notpresent notpresent NaN NaN NaN NaN NaN 9.5 28 0 0.0 no no no good yes no ckd
56 76.0 70.0 1.015 3.0 4.0 normal abnormal present notpresent NaN 164.0 9.70 131.0 4.4 10.2 30 11300 3.4 yes yes yes poor yes no ckd
57 76.0 90.0 NaN NaN NaN miss normal notpresent notpresent 93.0 155.0 7.30 132.0 4.9 NaN 0 0 0.0 yes yes yes poor no no ckd
58 73.0 80.0 1.020 2.0 0.0 abnormal abnormal notpresent notpresent 253.0 142.0 4.60 138.0 5.8 10.5 33 7200 4.3 yes yes yes good no no ckd
59 59.0 100.0 NaN NaN NaN miss miss notpresent notpresent NaN 96.0 6.40 NaN NaN 6.6 0 0 0.0 yes yes no good no yes ckd
60 67.0 90.0 1.020 1.0 0.0 miss abnormal present notpresent 141.0 66.0 3.20 138.0 6.6 NaN 0 0 0.0 yes no no good no no ckd
61 67.0 80.0 1.010 1.0 3.0 normal abnormal notpresent notpresent 182.0 391.0 32.00 163.0 39.0 NaN 0 0 0.0 no no no good yes no ckd
62 15.0 60.0 1.020 3.0 0.0 miss normal notpresent notpresent 86.0 15.0 0.60 138.0 4.0 11.0 33 7700 3.8 yes yes no good no no ckd
63 46.0 70.0 1.015 1.0 0.0 abnormal normal notpresent notpresent 150.0 111.0 6.10 131.0 3.7 7.5 27 0 0.0 no no no good no yes ckd
64 55.0 80.0 1.010 0.0 0.0 miss normal notpresent notpresent 146.0 NaN NaN NaN NaN 9.8 0 0 0.0 no no \tno good no no ckd
65 44.0 90.0 1.010 1.0 0.0 miss normal notpresent notpresent NaN 20.0 1.10 NaN NaN 15.0 48 0 0.0 no \tno no good no no ckd
66 67.0 70.0 1.020 2.0 0.0 abnormal normal notpresent notpresent 150.0 55.0 1.60 131.0 4.8 NaN -999 0 0.0 yes yes no good yes no ckd
67 45.0 80.0 1.020 3.0 0.0 normal abnormal notpresent notpresent 425.0 NaN NaN NaN NaN NaN 0 0 0.0 no no no poor no no ckd
68 65.0 70.0 1.010 2.0 0.0 miss normal present notpresent 112.0 73.0 3.30 NaN NaN 10.9 37 0 0.0 no no no good no no ckd
69 26.0 70.0 1.015 0.0 4.0 miss normal notpresent notpresent 250.0 20.0 1.10 NaN NaN 15.6 52 6900 6.0 no yes no good no no ckd
70 61.0 80.0 1.015 0.0 4.0 miss normal notpresent notpresent 360.0 19.0 0.70 137.0 4.4 15.2 44 8300 5.2 yes yes no good no no ckd
71 46.0 60.0 1.010 1.0 0.0 normal normal notpresent notpresent 163.0 92.0 3.30 141.0 4.0 9.8 28 14600 3.2 yes yes no good no no ckd
72 64.0 90.0 1.010 3.0 3.0 miss abnormal present notpresent NaN 35.0 1.30 NaN NaN 10.3 0 0 0.0 yes yes no good yes no ckd
73 NaN 100.0 1.015 2.0 0.0 abnormal abnormal notpresent notpresent 129.0 107.0 6.70 132.0 4.4 4.8 14 6300 0.0 yes no no good yes yes ckd
74 56.0 90.0 1.015 2.0 0.0 abnormal abnormal notpresent notpresent 129.0 107.0 6.70 131.0 4.8 9.1 29 6400 3.4 yes no no good no no ckd
75 5.0 NaN 1.015 1.0 0.0 miss normal notpresent notpresent NaN 16.0 0.70 138.0 3.2 8.1 0 0 0.0 no no no good no yes ckd
76 48.0 80.0 1.005 4.0 0.0 abnormal abnormal notpresent present 133.0 139.0 8.50 132.0 5.5 10.3 36 6200 4.0 no yes no good yes no ckd
77 67.0 70.0 1.010 1.0 0.0 miss normal notpresent notpresent 102.0 48.0 3.20 137.0 5.0 11.9 34 7100 3.7 yes yes no good yes no ckd
78 70.0 80.0 NaN NaN NaN miss miss notpresent notpresent 158.0 85.0 3.20 141.0 3.5 10.1 30 0 0.0 yes no no good yes no ckd
79 56.0 80.0 1.010 1.0 0.0 miss normal notpresent notpresent 165.0 55.0 1.80 NaN NaN 13.5 40 11800 5.0 yes yes no poor yes no ckd
80 74.0 80.0 1.010 0.0 0.0 miss normal notpresent notpresent 132.0 98.0 2.80 133.0 5.0 10.8 31 9400 3.8 yes yes no good no no ckd
81 45.0 90.0 NaN NaN NaN miss miss notpresent notpresent 360.0 45.0 2.40 128.0 4.4 8.3 29 5500 3.7 yes yes no good no no ckd
82 38.0 70.0 NaN NaN NaN miss miss notpresent notpresent 104.0 77.0 1.90 140.0 3.9 NaN 0 0 0.0 yes no no poor yes no ckd
83 48.0 70.0 1.015 1.0 0.0 normal normal notpresent notpresent 127.0 19.0 1.00 134.0 3.6 NaN 0 0 0.0 yes yes no good no no ckd
84 59.0 70.0 1.010 3.0 0.0 normal abnormal notpresent notpresent 76.0 186.0 15.00 135.0 7.6 7.1 22 3800 2.1 yes no no poor yes yes ckd
85 70.0 70.0 1.015 2.0 NaN miss miss notpresent notpresent NaN 46.0 1.50 NaN NaN 9.9 0 0 0.0 no yes no poor yes no ckd
86 56.0 80.0 NaN NaN NaN miss miss notpresent notpresent 415.0 37.0 1.90 NaN NaN NaN 0 0 0.0 no yes no good no no ckd
87 70.0 100.0 1.005 1.0 0.0 normal abnormal present notpresent 169.0 47.0 2.90 NaN NaN 11.1 32 5800 5.0 yes yes no poor no no ckd
88 58.0 110.0 1.010 4.0 0.0 miss normal notpresent notpresent 251.0 52.0 2.20 NaN NaN NaN 0 13200 4.7 yes \tyes no good no no ckd
89 50.0 70.0 1.020 0.0 0.0 miss normal notpresent notpresent 109.0 32.0 1.40 139.0 4.7 NaN 0 0 0.0 no no no poor no no ckd
90 63.0 100.0 1.010 2.0 2.0 normal normal notpresent present 280.0 35.0 3.20 143.0 3.5 13.0 40 9800 4.2 yes no yes good no no ckd
91 56.0 70.0 1.015 4.0 1.0 abnormal normal notpresent notpresent 210.0 26.0 1.70 136.0 3.8 16.1 52 12500 5.6 no no no good no no ckd
92 71.0 70.0 1.010 3.0 0.0 normal abnormal present present 219.0 82.0 3.60 133.0 4.4 10.4 33 5600 3.6 yes yes yes good no no ckd
93 73.0 100.0 1.010 3.0 2.0 abnormal abnormal present notpresent 295.0 90.0 5.60 140.0 2.9 9.2 30 7000 3.2 yes yes yes poor no no ckd
94 65.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 93.0 66.0 1.60 137.0 4.5 11.6 36 11900 3.9 no yes no good no no ckd
95 62.0 90.0 1.015 1.0 0.0 miss normal notpresent notpresent 94.0 25.0 1.10 131.0 3.7 NaN 0 0 0.0 yes no no good yes yes ckd
96 60.0 80.0 1.010 1.0 1.0 miss normal notpresent notpresent 172.0 32.0 2.70 NaN NaN 11.2 36 0 0.0 no yes yes poor no no ckd
97 65.0 60.0 1.015 1.0 0.0 miss normal notpresent notpresent 91.0 51.0 2.20 132.0 3.8 10.0 32 9100 4.0 yes yes no poor yes no ckd
98 50.0 140.0 NaN NaN NaN miss miss notpresent notpresent 101.0 106.0 6.50 135.0 4.3 6.2 18 5800 2.3 yes yes no poor no yes ckd
99 56.0 180.0 NaN 0.0 4.0 miss abnormal notpresent notpresent 298.0 24.0 1.20 139.0 3.9 11.2 32 10400 4.2 yes yes no poor yes no ckd
100 34.0 70.0 1.015 4.0 0.0 abnormal abnormal notpresent notpresent 153.0 22.0 0.90 133.0 3.8 NaN 0 0 0.0 no no no good yes no ckd
101 71.0 90.0 1.015 2.0 0.0 miss abnormal present present 88.0 80.0 4.40 139.0 5.7 11.3 33 10700 3.9 no no no good no no ckd
102 17.0 60.0 1.010 0.0 0.0 miss normal notpresent notpresent 92.0 32.0 2.10 141.0 4.2 13.9 52 7000 0.0 no no no good no no ckd
103 76.0 70.0 1.015 2.0 0.0 normal abnormal present notpresent 226.0 217.0 10.20 NaN NaN 10.2 36 12700 4.2 yes no no poor yes yes ckd
104 55.0 90.0 NaN NaN NaN miss miss notpresent notpresent 143.0 88.0 2.00 NaN NaN NaN 0 0 0.0 yes yes no poor yes no ckd
105 65.0 80.0 1.015 0.0 0.0 miss normal notpresent notpresent 115.0 32.0 11.50 139.0 4.0 14.1 42 6800 5.2 no no no good no no ckd
106 50.0 90.0 NaN NaN NaN miss miss notpresent notpresent 89.0 118.0 6.10 127.0 4.4 6.0 17 6500 0.0 yes yes no good yes yes ckd
107 55.0 100.0 1.015 1.0 4.0 normal miss notpresent notpresent 297.0 53.0 2.80 139.0 4.5 11.2 34 13600 4.4 yes yes no good no no ckd
108 45.0 80.0 1.015 0.0 0.0 miss abnormal notpresent notpresent 107.0 15.0 1.00 141.0 4.2 11.8 37 10200 4.2 no no no good no no ckd
109 54.0 70.0 NaN NaN NaN miss miss notpresent notpresent 233.0 50.1 1.90 NaN NaN 11.7 0 0 0.0 no yes no good no no ckd
110 63.0 90.0 1.015 0.0 0.0 miss normal notpresent notpresent 123.0 19.0 2.00 142.0 3.8 11.7 34 11400 4.7 no no no good no no ckd
111 65.0 80.0 1.010 3.0 3.0 miss normal notpresent notpresent 294.0 71.0 4.40 128.0 5.4 10.0 32 9000 3.9 yes yes yes good no no ckd
112 NaN 60.0 1.015 3.0 0.0 abnormal abnormal notpresent notpresent NaN 34.0 1.20 NaN NaN 10.8 33 0 0.0 no no no good no no ckd
113 61.0 90.0 1.015 0.0 2.0 miss normal notpresent notpresent NaN NaN NaN NaN NaN NaN 0 9800 0.0 no yes no poor no yes ckd
114 12.0 60.0 1.015 3.0 0.0 abnormal abnormal present notpresent NaN 51.0 1.80 NaN NaN 12.1 0 10300 0.0 no no no good no no ckd
115 47.0 80.0 1.010 0.0 0.0 miss abnormal notpresent notpresent NaN 28.0 0.90 NaN NaN 12.4 44 5600 4.3 no no no good no yes ckd
116 NaN 70.0 1.015 4.0 0.0 abnormal normal notpresent notpresent 104.0 16.0 0.50 NaN NaN NaN 0 0 0.0 no no no good yes no ckd
117 NaN 70.0 1.020 0.0 0.0 miss miss notpresent notpresent 219.0 36.0 1.30 139.0 3.7 12.5 37 9800 4.4 no no no good no no ckd
118 55.0 70.0 1.010 3.0 0.0 miss normal notpresent notpresent 99.0 25.0 1.20 NaN NaN 11.4 0 0 0.0 no no no poor yes no ckd
119 60.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 140.0 27.0 1.20 NaN NaN NaN 0 0 0.0 no no no good no no ckd
120 72.0 90.0 1.025 1.0 3.0 miss normal notpresent notpresent 323.0 40.0 2.20 137.0 5.3 12.6 0 0 0.0 no yes yes poor no no ckd
121 54.0 60.0 NaN 3.0 NaN miss miss notpresent notpresent 125.0 21.0 1.30 137.0 3.4 15.0 46 0 0.0 yes yes no good yes no ckd
122 34.0 70.0 NaN NaN NaN miss miss notpresent notpresent NaN 219.0 12.20 130.0 3.8 6.0 0 0 0.0 yes no no good no yes ckd
123 43.0 80.0 1.015 2.0 3.0 miss abnormal present present NaN 30.0 1.10 NaN NaN 14.0 42 14900 0.0 no no no good no no ckd
124 65.0 100.0 1.015 0.0 0.0 miss normal notpresent notpresent 90.0 98.0 2.50 NaN NaN 9.1 28 5500 3.6 yes no no good no no ckd
125 72.0 90.0 NaN NaN NaN miss miss notpresent notpresent 308.0 36.0 2.50 131.0 4.3 NaN 0 0 0.0 yes yes no poor no no ckd
126 70.0 90.0 1.015 0.0 0.0 miss normal notpresent notpresent 144.0 125.0 4.00 136.0 4.6 12.0 37 8200 4.5 yes yes no poor yes no ckd
127 71.0 60.0 1.015 4.0 0.0 normal normal notpresent notpresent 118.0 125.0 5.30 136.0 4.9 11.4 35 15200 4.3 yes yes no poor yes no ckd
128 52.0 90.0 1.015 4.0 3.0 normal abnormal notpresent notpresent 224.0 166.0 5.60 133.0 47.0 8.1 23 5000 2.9 yes yes no good no yes ckd
129 75.0 70.0 1.025 1.0 0.0 miss normal notpresent notpresent 158.0 49.0 1.40 135.0 4.7 11.1 0 0 0.0 yes no no poor yes no ckd
130 50.0 90.0 1.010 2.0 0.0 normal abnormal present present 128.0 208.0 9.20 134.0 4.8 8.2 22 16300 2.7 no no no poor yes yes ckd
131 5.0 50.0 1.010 0.0 0.0 miss normal notpresent notpresent NaN 25.0 0.60 NaN NaN 11.8 36 12400 0.0 no no no good no no ckd
132 50.0 NaN NaN NaN NaN normal miss notpresent notpresent 219.0 176.0 13.80 136.0 4.5 8.6 24 13200 2.7 yes no no good yes yes ckd
133 70.0 100.0 1.015 4.0 0.0 normal normal notpresent notpresent 118.0 125.0 5.30 136.0 4.9 12.0 37 8400 8.0 yes no no good no no ckd
134 47.0 100.0 1.010 NaN NaN normal miss notpresent notpresent 122.0 NaN 16.90 138.0 5.2 10.8 33 10200 3.8 no yes no good no no ckd
135 48.0 80.0 1.015 0.0 2.0 miss normal notpresent notpresent 214.0 24.0 1.30 140.0 4.0 13.2 39 0 0.0 no yes no poor no no ckd
136 46.0 90.0 1.020 NaN NaN miss normal notpresent notpresent 213.0 68.0 2.80 146.0 6.3 9.3 0 0 0.0 yes yes no good no no ckd
137 45.0 60.0 1.010 2.0 0.0 normal abnormal present notpresent 268.0 86.0 4.00 134.0 5.1 10.0 29 9200 0.0 yes yes no good no no ckd
138 73.0 NaN 1.010 1.0 0.0 miss miss notpresent notpresent 95.0 51.0 1.60 142.0 3.5 NaN 0 0 0.0 no \tno no good no no ckd
139 41.0 70.0 1.015 2.0 0.0 miss abnormal notpresent present NaN 68.0 2.80 132.0 4.1 11.1 33 0 0.0 yes no no good yes yes ckd
140 69.0 70.0 1.010 0.0 4.0 miss normal notpresent notpresent 256.0 40.0 1.20 142.0 5.6 NaN 0 0 0.0 no no no good no no ckd
141 67.0 70.0 1.010 1.0 0.0 normal normal notpresent notpresent NaN 106.0 6.00 137.0 4.9 6.1 19 6500 0.0 yes no no good no yes ckd
142 72.0 90.0 NaN NaN NaN miss miss notpresent notpresent 84.0 145.0 7.10 135.0 5.3 NaN 0 0 0.0 no yes no good no no ckd
143 41.0 80.0 1.015 1.0 4.0 abnormal normal notpresent notpresent 210.0 165.0 18.00 135.0 4.7 NaN 0 0 0.0 no yes no good no no ckd
144 60.0 90.0 1.010 2.0 0.0 abnormal normal notpresent notpresent 105.0 53.0 2.30 136.0 5.2 11.1 33 10500 4.1 no no no good no no ckd
145 57.0 90.0 1.015 5.0 0.0 abnormal abnormal notpresent present NaN 322.0 13.00 126.0 4.8 8.0 24 4200 3.3 yes yes yes poor yes yes ckd
146 53.0 100.0 1.010 1.0 3.0 abnormal normal notpresent notpresent 213.0 23.0 1.00 139.0 4.0 NaN 0 0 0.0 no yes no good no no ckd
147 60.0 60.0 1.010 3.0 1.0 normal abnormal present notpresent 288.0 36.0 1.70 130.0 3.0 7.9 25 15200 3.0 yes no no poor no yes ckd
148 69.0 60.0 NaN NaN NaN miss miss notpresent notpresent 171.0 26.0 48.10 NaN NaN NaN 0 0 0.0 yes no no poor no no ckd
149 65.0 70.0 1.020 1.0 0.0 abnormal abnormal notpresent notpresent 139.0 29.0 1.00 NaN NaN 10.5 32 0 0.0 yes no no good yes no ckd
150 8.0 60.0 1.025 3.0 0.0 normal normal notpresent notpresent 78.0 27.0 0.90 NaN NaN 12.3 41 6700 0.0 no no no poor yes no ckd
151 76.0 90.0 NaN NaN NaN miss miss notpresent notpresent 172.0 46.0 1.70 141.0 5.5 9.6 30 0 0.0 yes yes no good no yes ckd
152 39.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 121.0 20.0 0.80 133.0 3.5 10.9 32 0 0.0 no yes no good no no ckd
153 55.0 90.0 1.010 2.0 1.0 abnormal abnormal notpresent notpresent 273.0 235.0 14.20 132.0 3.4 8.3 22 14600 2.9 yes yes no poor yes yes ckd
154 56.0 90.0 1.005 4.0 3.0 abnormal abnormal notpresent notpresent 242.0 132.0 16.40 140.0 4.2 8.4 26 0 3.0 yes yes no poor yes yes ckd
155 50.0 70.0 1.020 3.0 0.0 abnormal normal present present 123.0 40.0 1.80 NaN NaN 11.1 36 4700 0.0 no no no good no no ckd
156 66.0 90.0 1.015 2.0 0.0 miss normal notpresent present 153.0 76.0 3.30 NaN NaN NaN 0 0 0.0 no no no poor no no ckd
157 62.0 70.0 1.025 3.0 0.0 normal abnormal notpresent notpresent 122.0 42.0 1.70 136.0 4.7 12.6 39 7900 3.9 yes yes no good no no ckd
158 71.0 60.0 1.020 3.0 2.0 normal normal present notpresent 424.0 48.0 1.50 132.0 4.0 10.9 31 0 0.0 yes yes yes good no no ckd
159 59.0 80.0 1.010 1.0 0.0 abnormal normal notpresent notpresent 303.0 35.0 1.30 122.0 3.5 10.4 35 10900 4.3 no yes no poor no no ckd
160 81.0 60.0 NaN NaN NaN miss miss notpresent notpresent 148.0 39.0 2.10 147.0 4.2 10.9 35 9400 2.4 yes yes yes poor yes no ckd
161 62.0 NaN 1.015 3.0 0.0 abnormal miss notpresent notpresent NaN NaN NaN NaN NaN 14.3 42 10200 4.8 yes yes no good no no ckd
162 59.0 70.0 NaN NaN NaN miss miss notpresent notpresent 204.0 34.0 1.50 124.0 4.1 9.8 37 6000 -999.0 no yes no good no no ckd
163 46.0 80.0 1.010 0.0 0.0 miss normal notpresent notpresent 160.0 40.0 2.00 140.0 4.1 9.0 27 8100 3.2 yes no no poor no yes ckd
164 14.0 NaN 1.015 0.0 0.0 miss miss notpresent notpresent 192.0 15.0 0.80 137.0 4.2 14.3 40 9500 5.4 no yes no poor yes no ckd
165 60.0 80.0 1.020 0.0 2.0 miss miss notpresent notpresent NaN NaN NaN NaN NaN NaN 0 0 0.0 no yes no good no no ckd
166 27.0 60.0 NaN NaN NaN miss miss notpresent notpresent 76.0 44.0 3.90 127.0 4.3 NaN 0 0 0.0 no no no poor yes yes ckd
167 34.0 70.0 1.020 0.0 0.0 abnormal normal notpresent notpresent 139.0 19.0 0.90 NaN NaN 12.7 42 2200 0.0 no no no poor no no ckd
168 65.0 70.0 1.015 4.0 4.0 miss normal present notpresent 307.0 28.0 1.50 NaN NaN 11.0 39 6700 0.0 yes yes no good no no ckd
169 NaN 70.0 1.010 0.0 2.0 miss normal notpresent notpresent 220.0 68.0 2.80 NaN NaN 8.7 27 0 0.0 yes yes no good no yes ckd
170 66.0 70.0 1.015 2.0 5.0 miss normal notpresent notpresent 447.0 41.0 1.70 131.0 3.9 12.5 33 9600 4.4 yes yes no good no no ckd
171 83.0 70.0 1.020 3.0 0.0 normal normal notpresent notpresent 102.0 60.0 2.60 115.0 5.7 8.7 26 12800 3.1 yes no no poor no yes ckd
172 62.0 80.0 1.010 1.0 2.0 miss miss notpresent notpresent 309.0 113.0 2.90 130.0 2.5 10.6 34 12800 4.9 no no no good no no ckd
173 17.0 70.0 1.015 1.0 0.0 abnormal normal notpresent notpresent 22.0 1.5 7.30 145.0 2.8 13.1 41 11200 0.0 no no no good no no ckd
174 54.0 70.0 NaN NaN NaN miss miss notpresent notpresent 111.0 146.0 7.50 141.0 4.7 11.0 35 8600 4.6 no no no good no no ckd
175 60.0 50.0 1.010 0.0 0.0 miss normal notpresent notpresent 261.0 58.0 2.20 113.0 3.0 NaN 0 4200 3.4 yes no no good no no ckd
176 21.0 90.0 1.010 4.0 0.0 normal abnormal present present 107.0 40.0 1.70 125.0 3.5 8.3 23 12400 3.9 no no no good no yes ckd
177 65.0 80.0 1.015 2.0 1.0 normal normal present notpresent 215.0 133.0 2.50 NaN NaN 13.2 41 0 0.0 no yes no good no no ckd
178 42.0 90.0 1.020 2.0 0.0 abnormal abnormal present notpresent 93.0 153.0 2.70 139.0 4.3 9.8 34 9800 0.0 no no no poor yes yes ckd
179 72.0 90.0 1.010 2.0 0.0 miss abnormal present notpresent 124.0 53.0 2.30 NaN NaN 11.9 39 0 0.0 no no no good no no ckd
180 73.0 90.0 1.010 1.0 4.0 abnormal abnormal present notpresent 234.0 56.0 1.90 NaN NaN 10.3 28 0 0.0 no yes no good no no ckd
181 45.0 70.0 1.025 2.0 0.0 normal abnormal present notpresent 117.0 52.0 2.20 136.0 3.8 10.0 30 19100 3.7 no no no good no no ckd
182 61.0 80.0 1.020 0.0 0.0 miss normal notpresent notpresent 131.0 23.0 0.80 140.0 4.1 11.3 35 0 0.0 no no no good no no ckd
183 30.0 70.0 1.015 0.0 0.0 miss normal notpresent notpresent 101.0 106.0 6.50 135.0 4.3 NaN 0 0 0.0 no no no poor no no ckd
184 54.0 60.0 1.015 3.0 2.0 miss abnormal notpresent notpresent 352.0 137.0 3.30 133.0 4.5 11.3 31 5800 3.6 yes yes yes poor yes no ckd
185 4.0 NaN 1.020 1.0 0.0 miss normal notpresent notpresent 99.0 23.0 0.60 138.0 4.4 12.0 34 -999 0.0 no no no good no no ckd
186 8.0 50.0 1.020 4.0 0.0 normal normal notpresent notpresent NaN 46.0 1.00 135.0 3.8 NaN 0 0 0.0 no no no good yes no ckd
187 3.0 NaN 1.010 2.0 0.0 normal normal notpresent notpresent NaN 22.0 0.70 NaN NaN 10.7 34 12300 0.0 no no no good no no ckd
188 8.0 NaN NaN NaN NaN miss miss notpresent notpresent 80.0 66.0 2.50 142.0 3.6 12.2 38 0 0.0 no \tno no good no no ckd
189 64.0 60.0 1.010 4.0 1.0 abnormal abnormal notpresent present 239.0 58.0 4.30 137.0 5.4 9.5 29 7500 3.4 yes yes no poor yes no ckd
190 6.0 60.0 1.010 4.0 0.0 abnormal abnormal notpresent present 94.0 67.0 1.00 135.0 4.9 9.9 30 16700 4.8 no no no poor no no ckd
191 NaN 70.0 1.010 3.0 0.0 normal normal notpresent notpresent 110.0 115.0 6.00 134.0 2.7 9.1 26 9200 3.4 yes yes no poor no no ckd
192 46.0 110.0 1.015 0.0 0.0 miss normal notpresent notpresent 130.0 16.0 0.90 NaN NaN NaN 0 0 0.0 no no no good no no ckd
193 32.0 90.0 1.025 1.0 0.0 abnormal abnormal notpresent notpresent NaN 223.0 18.10 113.0 6.5 5.5 15 2600 2.8 yes yes no poor yes yes ckd
194 80.0 70.0 1.010 2.0 NaN miss abnormal notpresent notpresent NaN 49.0 1.20 NaN NaN NaN 0 0 0.0 yes \tyes no good no no ckd
195 70.0 90.0 1.020 2.0 1.0 abnormal abnormal notpresent present 184.0 98.6 3.30 138.0 3.9 5.8 0 0 0.0 yes yes yes poor no no ckd
196 49.0 100.0 1.010 3.0 0.0 abnormal abnormal notpresent notpresent 129.0 158.0 11.80 122.0 3.2 8.1 24 9600 3.5 yes yes no poor yes yes ckd
197 57.0 80.0 NaN NaN NaN miss miss notpresent notpresent NaN 111.0 9.30 124.0 5.3 6.8 0 4300 3.0 yes yes no good no yes ckd
198 59.0 100.0 1.020 4.0 2.0 normal normal notpresent notpresent 252.0 40.0 3.20 137.0 4.7 11.2 30 26400 3.9 yes yes no poor yes no ckd
199 65.0 80.0 1.015 0.0 0.0 miss normal notpresent notpresent 92.0 37.0 1.50 140.0 5.2 8.8 25 10700 3.2 yes no yes good yes no ckd
200 90.0 90.0 1.025 1.0 0.0 miss normal notpresent notpresent 139.0 89.0 3.00 140.0 4.1 12.0 37 7900 3.9 yes yes no good no no ckd
201 64.0 70.0 NaN NaN NaN miss miss notpresent notpresent 113.0 94.0 7.30 137.0 4.3 7.9 21 0 0.0 yes yes yes good yes yes ckd
202 78.0 60.0 NaN NaN NaN miss miss notpresent notpresent 114.0 74.0 2.90 135.0 5.9 8.0 24 0 0.0 no yes no good no yes ckd
203 NaN 90.0 NaN NaN NaN miss miss notpresent notpresent 207.0 80.0 6.80 142.0 5.5 8.5 0 0 0.0 yes yes no good no yes ckd
204 65.0 90.0 1.010 4.0 2.0 normal normal notpresent notpresent 172.0 82.0 13.50 145.0 6.3 8.8 31 0 0.0 yes yes no good yes yes ckd
205 61.0 70.0 NaN NaN NaN miss miss notpresent notpresent 100.0 28.0 2.10 NaN NaN 12.6 43 0 0.0 yes yes no good no no ckd
206 60.0 70.0 1.010 1.0 0.0 miss normal notpresent notpresent 109.0 96.0 3.90 135.0 4.0 13.8 41 0 0.0 yes no no good no no ckd
207 50.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 230.0 50.0 2.20 NaN NaN 12.0 41 10400 4.6 yes yes no good no no ckd
208 67.0 80.0 NaN NaN NaN miss miss notpresent notpresent 341.0 37.0 1.50 NaN NaN 12.3 41 6900 4.9 yes yes no good no yes ckd
209 19.0 70.0 1.020 0.0 0.0 miss normal notpresent notpresent NaN NaN NaN NaN NaN 11.5 0 6900 0.0 no no no good no no ckd
210 59.0 100.0 1.015 4.0 2.0 normal normal notpresent notpresent 255.0 132.0 12.80 135.0 5.7 7.3 20 9800 3.9 yes yes yes good no yes ckd
211 54.0 120.0 1.015 0.0 0.0 miss normal notpresent notpresent 103.0 18.0 1.20 NaN NaN NaN 0 0 0.0 no no no good no no ckd
212 40.0 70.0 1.015 3.0 4.0 normal normal notpresent notpresent 253.0 150.0 11.90 132.0 5.6 10.9 31 8800 3.4 yes yes no poor yes no ckd
213 55.0 80.0 1.010 3.0 1.0 normal abnormal present present 214.0 73.0 3.90 137.0 4.9 10.9 34 7400 3.7 yes yes no good yes no ckd
214 68.0 80.0 1.015 0.0 0.0 miss abnormal notpresent notpresent 171.0 30.0 1.00 NaN NaN 13.7 43 4900 5.2 no yes no good no no ckd
215 2.0 NaN 1.010 3.0 0.0 normal abnormal notpresent notpresent NaN NaN NaN NaN NaN NaN 0 0 0.0 no no no good yes no ckd
216 64.0 70.0 1.010 0.0 0.0 miss normal notpresent notpresent 107.0 15.0 NaN NaN NaN 12.8 38 0 0.0 no no no good no no ckd
217 63.0 100.0 1.010 1.0 0.0 miss normal notpresent notpresent 78.0 61.0 1.80 141.0 4.4 12.2 36 10500 4.3 no yes no good no no ckd
218 33.0 90.0 1.015 0.0 0.0 miss normal notpresent notpresent 92.0 19.0 0.80 NaN NaN 11.8 34 7000 0.0 no no no good no no ckd
219 68.0 90.0 1.010 0.0 0.0 miss normal notpresent notpresent 238.0 57.0 2.50 NaN NaN 9.8 28 8000 3.3 yes yes no poor no no ckd
220 36.0 80.0 1.010 0.0 0.0 miss normal notpresent notpresent 103.0 NaN NaN NaN NaN 11.9 36 8800 0.0 no no no good no no ckd
221 66.0 70.0 1.020 1.0 0.0 normal miss notpresent notpresent 248.0 30.0 1.70 138.0 5.3 NaN 0 0 0.0 yes yes no good no no ckd
222 74.0 60.0 NaN NaN NaN miss miss notpresent notpresent 108.0 68.0 1.80 NaN NaN NaN 0 0 0.0 yes yes no good no no ckd
223 71.0 90.0 1.010 0.0 3.0 miss normal notpresent notpresent 303.0 30.0 1.30 136.0 4.1 13.0 38 9200 4.6 yes yes no good no no ckd
224 34.0 60.0 1.020 0.0 0.0 miss normal notpresent notpresent 117.0 28.0 2.20 138.0 3.8 NaN 0 0 0.0 no no no good yes no ckd
225 60.0 90.0 1.010 3.0 5.0 abnormal normal notpresent present 490.0 95.0 2.70 131.0 3.8 11.5 35 12000 4.5 yes yes no good no no ckd
226 64.0 100.0 1.015 4.0 2.0 abnormal abnormal notpresent present 163.0 54.0 7.20 140.0 4.6 7.9 26 7500 3.4 yes yes no good yes no ckd
227 57.0 80.0 1.015 0.0 0.0 miss normal notpresent notpresent 120.0 48.0 1.60 NaN NaN 11.3 36 7200 3.8 yes yes no good no no ckd
228 60.0 70.0 NaN NaN NaN miss miss notpresent notpresent 124.0 52.0 2.50 NaN NaN NaN 0 0 0.0 yes no no good no no ckd
229 59.0 50.0 1.010 3.0 0.0 normal abnormal notpresent notpresent 241.0 191.0 12.00 114.0 2.9 9.6 31 15700 3.8 no yes no good yes no ckd
230 65.0 60.0 1.010 2.0 0.0 normal abnormal present notpresent 192.0 17.0 1.70 130.0 4.3 NaN 0 9500 0.0 yes yes no poor no no ckd\t
231 60.0 90.0 NaN NaN NaN miss miss notpresent notpresent 269.0 51.0 2.80 138.0 3.7 11.5 35 0 0.0 yes yes yes good yes no ckd
232 50.0 90.0 1.015 1.0 0.0 abnormal abnormal notpresent notpresent NaN NaN NaN NaN NaN NaN 0 0 0.0 no no no good yes no ckd
233 51.0 100.0 1.015 2.0 0.0 normal normal notpresent present 93.0 20.0 1.60 146.0 4.5 NaN 0 0 0.0 no no no poor no no ckd
234 37.0 100.0 1.010 0.0 0.0 abnormal normal notpresent notpresent NaN 19.0 1.30 NaN NaN 15.0 44 4100 5.2 yes no no good no no ckd
235 45.0 70.0 1.010 2.0 0.0 miss normal notpresent notpresent 113.0 93.0 2.30 NaN NaN 7.9 26 5700 0.0 no no yes good no yes ckd
236 65.0 80.0 NaN NaN NaN miss miss notpresent notpresent 74.0 66.0 2.00 136.0 5.4 9.1 25 0 0.0 yes yes yes good yes no ckd
237 80.0 70.0 1.015 2.0 2.0 miss normal notpresent notpresent 141.0 53.0 2.20 NaN NaN 12.7 40 9600 0.0 yes yes no poor yes no ckd
238 72.0 100.0 NaN NaN NaN miss miss notpresent notpresent 201.0 241.0 13.40 127.0 4.8 9.4 28 0 0.0 yes yes no good no yes ckd
239 34.0 90.0 1.015 2.0 0.0 normal normal notpresent notpresent 104.0 50.0 1.60 137.0 4.1 11.9 39 0 0.0 no no no good no no ckd
240 65.0 70.0 1.015 1.0 0.0 miss normal notpresent notpresent 203.0 46.0 1.40 NaN NaN 11.4 36 5000 4.1 yes yes no poor yes no ckd
241 57.0 70.0 1.015 1.0 0.0 miss abnormal notpresent notpresent 165.0 45.0 1.50 140.0 3.3 10.4 31 4200 3.9 no no no good no no ckd
242 69.0 70.0 1.010 4.0 3.0 normal abnormal present present 214.0 96.0 6.30 120.0 3.9 9.4 28 11500 3.3 yes yes yes good yes yes ckd
243 62.0 90.0 1.020 2.0 1.0 miss normal notpresent notpresent 169.0 48.0 2.40 138.0 2.9 13.4 47 11000 6.1 yes no no good no no ckd
244 64.0 90.0 1.015 3.0 2.0 miss abnormal present notpresent 463.0 64.0 2.80 135.0 4.1 12.2 40 9800 4.6 yes yes no good no yes ckd
245 48.0 100.0 NaN NaN NaN miss miss notpresent notpresent 103.0 79.0 5.30 135.0 6.3 6.3 19 7200 2.6 yes no yes poor no no ckd
246 48.0 110.0 1.015 3.0 0.0 abnormal normal present notpresent 106.0 215.0 15.20 120.0 5.7 8.6 26 5000 2.5 yes no yes good no yes ckd
247 54.0 90.0 1.025 1.0 0.0 normal abnormal notpresent notpresent 150.0 18.0 1.20 140.0 4.2 NaN 0 0 0.0 no no no poor yes yes ckd
248 59.0 70.0 1.010 1.0 3.0 abnormal abnormal notpresent notpresent 424.0 55.0 1.70 138.0 4.5 12.6 37 10200 4.1 yes yes yes good no no ckd
249 56.0 90.0 1.010 4.0 1.0 normal abnormal present notpresent 176.0 309.0 13.30 124.0 6.5 3.1 9 5400 2.1 yes yes no poor yes yes ckd
250 40.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 140.0 10.0 1.20 135.0 5.0 15.0 48 10400 4.5 no no no good no no notckd
251 23.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 70.0 36.0 1.00 150.0 4.6 17.0 52 9800 5.0 no no no good no no notckd
252 45.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 82.0 49.0 0.60 147.0 4.4 15.9 46 9100 4.7 no no no good no no notckd
253 57.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 119.0 17.0 1.20 135.0 4.7 15.4 42 6200 6.2 no no no good no no notckd
254 51.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 99.0 38.0 0.80 135.0 3.7 13.0 49 8300 5.2 no no no good no no notckd
255 34.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 121.0 27.0 1.20 144.0 3.9 13.6 52 9200 6.3 no no no good no no notckd
256 60.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 131.0 10.0 0.50 146.0 5.0 14.5 41 10700 5.1 no no no good no no notckd
257 38.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 91.0 36.0 0.70 135.0 3.7 14.0 46 9100 5.8 no no no good no no notckd
258 42.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 98.0 20.0 0.50 140.0 3.5 13.9 44 8400 5.5 no no no good no no notckd
259 35.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 104.0 31.0 1.20 135.0 5.0 16.1 45 4300 5.2 no no no good no no notckd
260 30.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 131.0 38.0 1.00 147.0 3.8 14.1 45 9400 5.3 no no no good no no notckd
261 49.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 122.0 32.0 1.20 139.0 3.9 17.0 41 5600 4.9 no no no good no no notckd
262 55.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 118.0 18.0 0.90 135.0 3.6 15.5 43 7200 5.4 no no no good no no notckd
263 45.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 117.0 46.0 1.20 137.0 5.0 16.2 45 8600 5.2 no no no good no no notckd
264 42.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 132.0 24.0 0.70 140.0 4.1 14.4 50 5000 4.5 no no no good no no notckd
265 50.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 97.0 40.0 0.60 150.0 4.5 14.2 48 10500 5.0 no no no good no no notckd
266 55.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 133.0 17.0 1.20 135.0 4.8 13.2 41 6800 5.3 no no no good no no notckd
267 48.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 122.0 33.0 0.90 146.0 3.9 13.9 48 9500 4.8 no no no good no no notckd
268 NaN 80.0 NaN NaN NaN miss miss notpresent notpresent 100.0 49.0 1.00 140.0 5.0 16.3 53 8500 4.9 no no no good no no notckd
269 25.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 121.0 19.0 1.20 142.0 4.9 15.0 48 6900 5.3 no no no good no no notckd
270 23.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 111.0 34.0 1.10 145.0 4.0 14.3 41 7200 5.0 no no no good no no notckd
271 30.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 96.0 25.0 0.50 144.0 4.8 13.8 42 9000 4.5 no no no good no no notckd
272 56.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 139.0 15.0 1.20 135.0 5.0 14.8 42 5600 5.5 no no no good no no notckd
273 47.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 95.0 35.0 0.90 140.0 4.1 NaN 0 0 0.0 no no no good no no notckd
274 19.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 107.0 23.0 0.70 141.0 4.2 14.4 44 0 0.0 no no no good no no notckd
275 52.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 125.0 22.0 1.20 139.0 4.6 16.5 43 4700 4.6 no no no good no no notckd
276 20.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent NaN NaN NaN 137.0 4.7 14.0 41 4500 5.5 no no no good no no notckd
277 46.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 123.0 46.0 1.00 135.0 5.0 15.7 50 6300 4.8 no no no good no no notckd
278 48.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 112.0 44.0 1.20 142.0 4.9 14.5 44 9400 6.4 no no no good no no notckd
279 24.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 140.0 23.0 0.60 140.0 4.7 16.3 48 5800 5.6 no no no good no no notckd
280 47.0 80.0 NaN NaN NaN miss miss notpresent notpresent 93.0 33.0 0.90 144.0 4.5 13.3 52 8100 5.2 no no no good no no notckd
281 55.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 130.0 50.0 1.20 147.0 5.0 15.5 41 9100 6.0 no no no good no no notckd
282 20.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 123.0 44.0 1.00 135.0 3.8 14.6 44 5500 4.8 no no no good no no notckd
283 60.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent NaN NaN NaN NaN NaN 16.4 43 10800 5.7 no no no good no no notckd
284 33.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 100.0 37.0 1.20 142.0 4.0 16.9 52 6700 6.0 no no no good no no notckd
285 66.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 94.0 19.0 0.70 135.0 3.9 16.0 41 5300 5.9 no no no good no no notckd
286 71.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 81.0 18.0 0.80 145.0 5.0 14.7 44 9800 6.0 no no no good no no notckd
287 39.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 124.0 22.0 0.60 137.0 3.8 13.4 43 0 0.0 no no no good no no notckd
288 56.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 70.0 46.0 1.20 135.0 4.9 15.9 50 11000 5.1 miss miss miss good no no notckd
289 42.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 93.0 32.0 0.90 143.0 4.7 16.6 43 7100 5.3 no no no good no no notckd
290 54.0 70.0 1.020 0.0 0.0 miss miss miss miss 76.0 28.0 0.60 146.0 3.5 14.8 52 8400 5.9 no no no good no no notckd
291 47.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 124.0 44.0 1.00 140.0 4.9 14.9 41 7000 5.7 no no no good no no notckd
292 30.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 89.0 42.0 0.50 139.0 5.0 16.7 52 10200 5.0 no no no good no no notckd
293 50.0 NaN 1.020 0.0 0.0 normal normal notpresent notpresent 92.0 19.0 1.20 150.0 4.8 14.9 48 4700 5.4 no no no good no no notckd
294 75.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 110.0 50.0 0.70 135.0 5.0 14.3 40 8300 5.8 no no no miss miss miss notckd
295 44.0 70.0 NaN NaN NaN miss miss notpresent notpresent 106.0 25.0 0.90 150.0 3.6 15.0 50 9600 6.5 no no no good no no notckd
296 41.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 125.0 38.0 0.60 140.0 5.0 16.8 41 6300 5.9 no no no good no no notckd
297 53.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 116.0 26.0 1.00 146.0 4.9 15.8 45 7700 5.2 miss miss miss good no no notckd
298 34.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 91.0 49.0 1.20 135.0 4.5 13.5 48 8600 4.9 no no no good no no notckd
299 73.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 127.0 48.0 0.50 150.0 3.5 15.1 52 11000 4.7 no no no good no no notckd
300 45.0 60.0 1.020 0.0 0.0 normal normal miss miss 114.0 26.0 0.70 141.0 4.2 15.0 43 9200 5.8 no no no good no no notckd
301 44.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 96.0 33.0 0.90 147.0 4.5 16.9 41 7200 5.0 no no no good no no notckd
302 29.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 127.0 44.0 1.20 145.0 5.0 14.8 48 0 0.0 no no no good no no notckd
303 55.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 107.0 26.0 1.10 NaN NaN 17.0 50 6700 6.1 no no no good no no notckd
304 33.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 128.0 38.0 0.60 135.0 3.9 13.1 45 6200 4.5 no no no good no no notckd
305 41.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 122.0 25.0 0.80 138.0 5.0 17.1 41 9100 5.2 no no no good no no notckd
306 52.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 128.0 30.0 1.20 140.0 4.5 15.2 52 4300 5.7 no no no good no no notckd
307 47.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 137.0 17.0 0.50 150.0 3.5 13.6 44 7900 4.5 no no no good no no notckd
308 43.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 81.0 46.0 0.60 135.0 4.9 13.9 48 6900 4.9 no no no good no no notckd
309 51.0 60.0 1.020 0.0 0.0 miss miss notpresent notpresent 129.0 25.0 1.20 139.0 5.0 17.2 40 8100 5.9 no no no good no no notckd
310 46.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 102.0 27.0 0.70 142.0 4.9 13.2 44 11000 5.4 no no no good no no notckd
311 56.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 132.0 18.0 1.10 147.0 4.7 13.7 45 7500 5.6 no no no good no no notckd
312 80.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent NaN NaN NaN 135.0 4.1 15.3 48 6300 6.1 no no no good no no notckd
313 55.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 104.0 28.0 0.90 142.0 4.8 17.3 52 8200 4.8 no no no good no no notckd
314 39.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 131.0 46.0 0.60 145.0 5.0 15.6 41 9400 4.7 no no no good no no notckd
315 44.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent NaN NaN NaN NaN NaN 13.8 48 7800 4.4 no no no good no no notckd
316 35.0 NaN 1.020 0.0 0.0 normal normal miss miss 99.0 30.0 0.50 135.0 4.9 15.4 48 5000 5.2 no no no good no no notckd
317 58.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 102.0 48.0 1.20 139.0 4.3 15.0 40 8100 4.9 no no no good no no notckd
318 61.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 120.0 29.0 0.70 137.0 3.5 17.4 52 7000 5.3 no no no good no no notckd
319 30.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 138.0 15.0 1.10 135.0 4.4 NaN 0 0 0.0 no no no good no no notckd
320 57.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 105.0 49.0 1.20 150.0 4.7 15.7 44 10400 6.2 no no no good no no notckd
321 65.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 109.0 39.0 1.00 144.0 3.5 13.9 48 9600 4.8 no no no good no no notckd
322 70.0 60.0 NaN NaN NaN miss miss notpresent notpresent 120.0 40.0 0.50 140.0 4.6 16.0 43 4500 4.9 no no no good no no notckd
323 43.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 130.0 30.0 1.10 143.0 5.0 15.9 45 7800 4.5 no no no good no no notckd
324 40.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 119.0 15.0 0.70 150.0 4.9 NaN 0 0 0.0 no no no good no no notckd
325 58.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 100.0 50.0 1.20 140.0 3.5 14.0 50 6700 6.5 no no no good no no notckd
326 47.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 109.0 25.0 1.10 141.0 4.7 15.8 41 8300 5.2 no no no good no no notckd
327 30.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 120.0 31.0 0.80 150.0 4.6 13.4 44 10700 5.8 no no no good no no notckd
328 28.0 70.0 1.020 0.0 0.0 normal normal miss miss 131.0 29.0 0.60 145.0 4.9 NaN 45 8600 6.5 no no no good no no notckd
329 33.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 80.0 25.0 0.90 146.0 3.5 14.1 48 7800 5.1 no no no good no no notckd
330 43.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 114.0 32.0 1.10 135.0 3.9 NaN 42 0 0.0 no no no good no no notckd
331 59.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 130.0 39.0 0.70 147.0 4.7 13.5 46 6700 4.5 no no no good no no notckd
332 34.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent NaN 33.0 1.00 150.0 5.0 15.3 44 10500 6.1 no no no good no no notckd
333 23.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 99.0 46.0 1.20 142.0 4.0 17.7 46 4300 5.5 no no no good no no notckd
334 24.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 125.0 NaN NaN 136.0 3.5 15.4 43 5600 4.5 no no no good no no notckd
335 60.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 134.0 45.0 0.50 139.0 4.8 14.2 48 10700 5.6 no no no good no no notckd
336 25.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 119.0 27.0 0.50 NaN NaN 15.2 40 9200 5.2 no no no good no no notckd
337 44.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 92.0 40.0 0.90 141.0 4.9 14.0 52 7500 6.2 no no no good no no notckd
338 62.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 132.0 34.0 0.80 147.0 3.5 17.8 44 4700 4.5 no no no good no no notckd
339 25.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 88.0 42.0 0.50 136.0 3.5 13.3 48 7000 4.9 no no no good no no notckd
340 32.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 100.0 29.0 1.10 142.0 4.5 14.3 43 6700 5.9 no no no good no no notckd
341 63.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 130.0 37.0 0.90 150.0 5.0 13.4 41 7300 4.7 no no no good no no notckd
342 44.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 95.0 46.0 0.50 138.0 4.2 15.0 50 7700 6.3 no no no good no no notckd
343 37.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 111.0 35.0 0.80 135.0 4.1 16.2 50 5500 5.7 no no no good no no notckd
344 64.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 106.0 27.0 0.70 150.0 3.3 14.4 42 8100 4.7 no no no good no no notckd
345 22.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 97.0 18.0 1.20 138.0 4.3 13.5 42 7900 6.4 no no no good no no notckd
346 33.0 60.0 NaN NaN NaN normal normal notpresent notpresent 130.0 41.0 0.90 141.0 4.4 15.5 52 4300 5.8 no no no good no no notckd
347 43.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 108.0 25.0 1.00 144.0 5.0 17.8 43 7200 5.5 no no no good no no notckd
348 38.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 99.0 19.0 0.50 147.0 3.5 13.6 44 7300 6.4 no no no good no no notckd
349 35.0 70.0 1.025 0.0 0.0 miss miss notpresent notpresent 82.0 36.0 1.10 150.0 3.5 14.5 52 9400 6.1 no no no good no no notckd
350 65.0 70.0 1.025 0.0 0.0 miss miss notpresent notpresent 85.0 20.0 1.00 142.0 4.8 16.1 43 9600 4.5 no no no good no no notckd
351 29.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 83.0 49.0 0.90 139.0 3.3 17.5 40 9900 4.7 no no no good no no notckd
352 37.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 109.0 47.0 1.10 141.0 4.9 15.0 48 7000 5.2 no no no good no no notckd
353 39.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 86.0 37.0 0.60 150.0 5.0 13.6 51 5800 4.5 no no no good no no notckd
354 32.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 102.0 17.0 0.40 147.0 4.7 14.6 41 6800 5.1 no no no good no no notckd
355 23.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 95.0 24.0 0.80 145.0 5.0 15.0 52 6300 4.6 no no no good no no notckd
356 34.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 87.0 38.0 0.50 144.0 4.8 17.1 47 7400 6.1 no no no good no no notckd
357 66.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 107.0 16.0 1.10 140.0 3.6 13.6 42 11000 4.9 no no no good no no notckd
358 47.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 117.0 22.0 1.20 138.0 3.5 13.0 45 5200 5.6 no no no good no no notckd
359 74.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 88.0 50.0 0.60 147.0 3.7 17.2 53 6000 4.5 no no no good no no notckd
360 35.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 105.0 39.0 0.50 135.0 3.9 14.7 43 5800 6.2 no no no good no no notckd
361 29.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 70.0 16.0 0.70 138.0 3.5 13.7 54 5400 5.8 no no no good no no notckd
362 33.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 89.0 19.0 1.10 144.0 5.0 15.0 40 10300 4.8 no no no good no no notckd
363 67.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 99.0 40.0 0.50 NaN NaN 17.8 44 5900 5.2 no no no good no no notckd
364 73.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 118.0 44.0 0.70 137.0 3.5 14.8 45 9300 4.7 no no no good no no notckd
365 24.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 93.0 46.0 1.00 145.0 3.5 NaN 0 10700 6.3 no no no good no no notckd
366 60.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 81.0 15.0 0.50 141.0 3.6 15.0 46 10500 5.3 no no no good no no notckd
367 68.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 125.0 41.0 1.10 139.0 3.8 17.4 50 6700 6.1 no no no good no no notckd
368 30.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 82.0 42.0 0.70 146.0 5.0 14.9 45 9400 5.9 no no no good no no notckd
369 75.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 107.0 48.0 0.80 144.0 3.5 13.6 46 10300 4.8 no no no good no no notckd
370 69.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 83.0 42.0 1.20 139.0 3.7 16.2 50 9300 5.4 no no no good no no notckd
371 28.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 79.0 50.0 0.50 145.0 5.0 17.6 51 6500 5.0 no no no good no no notckd
372 72.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 109.0 26.0 0.90 150.0 4.9 15.0 52 10500 5.5 no no no good no no notckd
373 61.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 133.0 38.0 1.00 142.0 3.6 13.7 47 9200 4.9 no no no good no no notckd
374 79.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 111.0 44.0 1.20 146.0 3.6 16.3 40 8000 6.4 no no no good no no notckd
375 70.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 74.0 41.0 0.50 143.0 4.5 15.1 48 9700 5.6 no no no good no no notckd
376 58.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 88.0 16.0 1.10 147.0 3.5 16.4 53 9100 5.2 no no no good no no notckd
377 64.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 97.0 27.0 0.70 145.0 4.8 13.8 49 6400 4.8 no no no good no no notckd
378 71.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent NaN NaN 0.90 140.0 4.8 15.2 42 7700 5.5 no no no good no no notckd
379 62.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 78.0 45.0 0.60 138.0 3.5 16.1 50 5400 5.7 no no no good no no notckd
380 59.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 113.0 23.0 1.10 139.0 3.5 15.3 54 6500 4.9 no no no good no no notckd
381 71.0 70.0 1.025 0.0 0.0 miss miss notpresent notpresent 79.0 47.0 0.50 142.0 4.8 16.6 40 5800 5.9 no no no good no no notckd
382 48.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 75.0 22.0 0.80 137.0 5.0 16.8 51 6000 6.5 no no no good no no notckd
383 80.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 119.0 46.0 0.70 141.0 4.9 13.9 49 5100 5.0 no no no good no no notckd
384 57.0 60.0 1.020 0.0 0.0 normal normal notpresent notpresent 132.0 18.0 1.10 150.0 4.7 15.4 42 11000 4.5 no no no good no no notckd
385 63.0 70.0 1.020 0.0 0.0 normal normal notpresent notpresent 113.0 25.0 0.60 146.0 4.9 16.5 52 8000 5.1 no no no good no no notckd
386 46.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 100.0 47.0 0.50 142.0 3.5 16.4 43 5700 6.5 no no no good no no notckd
387 15.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 93.0 17.0 0.90 136.0 3.9 16.7 50 6200 5.2 no no no good no no notckd
388 51.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 94.0 15.0 1.20 144.0 3.7 15.5 46 9500 6.4 no no no good no no notckd
389 41.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 112.0 48.0 0.70 140.0 5.0 17.0 52 7200 5.8 no no no good no no notckd
390 52.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 99.0 25.0 0.80 135.0 3.7 15.0 52 6300 5.3 no no no good no no notckd
391 36.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 85.0 16.0 1.10 142.0 4.1 15.6 44 5800 6.3 no no no good no no notckd
392 57.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 133.0 48.0 1.20 147.0 4.3 14.8 46 6600 5.5 no no no good no no notckd
393 43.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 117.0 45.0 0.70 141.0 4.4 13.0 54 7400 5.4 no no no good no no notckd
394 50.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 137.0 46.0 0.80 139.0 5.0 14.1 45 9500 4.6 no no no good no no notckd
395 55.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 140.0 49.0 0.50 150.0 4.9 15.7 47 6700 4.9 no no no good no no notckd
396 42.0 70.0 1.025 0.0 0.0 normal normal notpresent notpresent 75.0 31.0 1.20 141.0 3.5 16.5 54 7800 6.2 no no no good no no notckd
397 12.0 80.0 1.020 0.0 0.0 normal normal notpresent notpresent 100.0 26.0 0.60 137.0 4.4 15.8 49 6600 5.4 no no no good no no notckd
398 17.0 60.0 1.025 0.0 0.0 normal normal notpresent notpresent 114.0 50.0 1.00 135.0 4.9 14.2 51 7200 5.9 no no no good no no notckd
399 58.0 80.0 1.025 0.0 0.0 normal normal notpresent notpresent 131.0 18.0 1.10 141.0 3.5 15.8 53 6800 6.1 no no no good no no notckd
In [33]:
# some further cleaning is required to remove the \t characters is a couple of columns replacing the instances with the standard formating
# classification, cad,  dm
df_clean['classification'] = df_clean['classification'].replace("ckd\t","ckd")
df_clean['cad'] = df_clean['cad'].replace("\tno","no")
df_clean['dm'] = df_clean['dm'].replace("\tno","no")
df_clean['dm'] = df_clean['dm'].replace("\tyes", "yes")
df_clean['dm'] = df_clean['dm'].replace(" yes", "yes")
In [34]:
# subsetting columns with another boolean mask for categorical columns and object columns
cat_mask_obj2 = (df_clean.dtypes == "object") | (df_clean.dtypes == "category")

# Get list of categorical column names
cat_mask_object2 = df_clean.columns[cat_mask_obj2].tolist()

# remove the column classification
cat_mask_object2.remove("classification")

# see what columns are left
print(cat_mask_object2)
['rbc', 'pc', 'pcc', 'ba', 'htn', 'dm', 'cad', 'appet', 'pe', 'ane']
In [35]:
# look into the XGBoost course to figure out how the categorical imputer works
# combine everything and use DictVectorizer for one hot encoding and label encoding

# conversion of our dataframe into a dictionary so as to use DictVectorizer
# this function is mostly used in text processing
df_dict = df_clean[cat_mask_object2].to_dict("records")

# Make a DictVectorizer: use documentation to learn how it works
# In short, it speeds up one hot encoding with meaningful columns created
# we don't want a sparse matrix right?
dv = DictVectorizer(sparse = False)

# Apply fit_transform to our dataset
df_encoded = dv.fit_transform(df_dict)

# see 10 rows
print (df_encoded[:10,:])
print ("=" * 100) # just formatting to distinguish outputs

# print the vocabulary that is, the columns of the dataset, note that order changes
# upon transformation
print(dv.vocabulary_)
print ("=" * 100) # more formatting

print(df_encoded.shape) # number of rows and columns for the encoded dataset
print(df_clean[cat_mask_object2].shape) # number of rows and columns for the original dataset
print("After doing the transformation the columns increase to 21.")
[[0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 0.
  0. 1. 0. 0. 1. 0.]
 [0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
  0. 1. 0. 0. 1. 0.]
 [0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 1. 0.
  0. 1. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1.
  0. 0. 1. 0. 0. 1.]
 [0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
  0. 1. 0. 0. 0. 1.]
 [0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0.
  0. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0.
  0. 1. 0. 0. 1. 0.]
 [0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 1. 0.
  0. 0. 1. 0. 0. 1.]
 [0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1.
  0. 1. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1.
  0. 1. 0. 1. 0. 0.]]
====================================================================================================
{'rbc=miss': 28, 'pc=normal': 20, 'pcc=notpresent': 22, 'ba=notpresent': 7, 'htn=yes': 17, 'dm=yes': 14, 'cad=no': 10, 'appet=good': 3, 'pe=no': 25, 'ane=no': 1, 'htn=no': 16, 'dm=no': 13, 'rbc=normal': 29, 'appet=poor': 5, 'ane=yes': 2, 'pc=abnormal': 18, 'pcc=present': 23, 'pe=yes': 26, 'pc=miss': 19, 'rbc=abnormal': 27, 'cad=yes': 11, 'ba=present': 8, 'htn=miss': 15, 'dm=miss': 12, 'cad=miss': 9, 'pcc=miss': 21, 'ba=miss': 6, 'appet=miss': 4, 'pe=miss': 24, 'ane=miss': 0}
====================================================================================================
(400, 30)
(400, 10)
After doing the transformation the columns increase to 21.
In [36]:
# You can try
# make a pipeline to merge the encoding as well as the visualization
# Use t-SNE and or PCA to see the differences between groups this will be the EDA step
# make a train and test split go through the slides for how to win kaggle competitions and test the ideas
# make the next step a pipeline object like in the xgboost course and try random forest, xgboost and decision tree classifier
# later use the ensembling techniques: Try all the ensembling techniques you know.
In [37]:
# see the transformed dataframe with all the missing values imputed
df_clean[cat_mask_numeric]
Out[37]:
age bp sg al su bgr bu sc sod pot hemo
id
0 48.0 80.0 1.020 1.0 0.0 121.0 36.0 1.2 0.0 0.0 15.4
1 7.0 50.0 1.020 4.0 0.0 0.0 18.0 0.8 0.0 0.0 11.3
2 62.0 80.0 1.010 2.0 3.0 423.0 53.0 1.8 0.0 0.0 9.6
3 48.0 70.0 1.005 4.0 0.0 117.0 56.0 3.8 111.0 2.5 11.2
4 51.0 80.0 1.010 2.0 0.0 106.0 26.0 1.4 0.0 0.0 11.6
... ... ... ... ... ... ... ... ... ... ... ...
395 55.0 80.0 1.020 0.0 0.0 140.0 49.0 0.5 150.0 4.9 15.7
396 42.0 70.0 1.025 0.0 0.0 75.0 31.0 1.2 141.0 3.5 16.5
397 12.0 80.0 1.020 0.0 0.0 100.0 26.0 0.6 137.0 4.4 15.8
398 17.0 60.0 1.025 0.0 0.0 114.0 50.0 1.0 135.0 4.9 14.2
399 58.0 80.0 1.025 0.0 0.0 131.0 18.0 1.1 141.0 3.5 15.8

400 rows × 11 columns

In [38]:
# simply taking the vectorized columns and the numeric columns and bringing them together
# to make an array for a classifier
concat_cols = np.hstack((df_encoded, df_clean[cat_mask_numeric].values))

# another version that is in dataframe format
# make a dataframe with the encoded features and give the columns names from the dictVectorizer
df_cat_var = pd.DataFrame(df_encoded, columns=dv.get_feature_names_out())

# combine the columns together with the categorical features i.e add columns to the numerical dataframe with other dataframe with the categorical and object data types
concat_cols_df = pd.concat([df_clean[cat_mask_numeric], df_cat_var], axis=1)
concat_cols.shape
Out[38]:
(400, 41)
In [39]:
# the final dataframe we'll use for classification
concat_cols_df
Out[39]:
age bp sg al su bgr bu sc sod pot ... pc=normal pcc=miss pcc=notpresent pcc=present pe=miss pe=no pe=yes rbc=abnormal rbc=miss rbc=normal
0 48.0 80.0 1.020 1.0 0.0 121.0 36.0 1.2 0.0 0.0 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
1 7.0 50.0 1.020 4.0 0.0 0.0 18.0 0.8 0.0 0.0 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
2 62.0 80.0 1.010 2.0 3.0 423.0 53.0 1.8 0.0 0.0 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
3 48.0 70.0 1.005 4.0 0.0 117.0 56.0 3.8 111.0 2.5 ... 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0
4 51.0 80.0 1.010 2.0 0.0 106.0 26.0 1.4 0.0 0.0 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
395 55.0 80.0 1.020 0.0 0.0 140.0 49.0 0.5 150.0 4.9 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
396 42.0 70.0 1.025 0.0 0.0 75.0 31.0 1.2 141.0 3.5 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
397 12.0 80.0 1.020 0.0 0.0 100.0 26.0 0.6 137.0 4.4 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
398 17.0 60.0 1.025 0.0 0.0 114.0 50.0 1.0 135.0 4.9 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
399 58.0 80.0 1.025 0.0 0.0 131.0 18.0 1.1 141.0 3.5 ... 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0

400 rows × 41 columns

In [40]:
# this operation is not necessary but it's good to see the final dataframe
#pd.set_option('future.no_silent_downcasting', True)
In [41]:
# now get the target variable into a numeric form
# there's a simpler step where you can use map instead  y = df_clean["classification"].map(lambda val1: 1 if val1 == "ckd" else 0)
# y = y.values
col_preprocess = df_clean["classification"].replace("ckd", 1)
final_col_preprocess = col_preprocess.replace("notckd", 0)
y = final_col_preprocess.values
print(y)
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
/tmp/ipykernel_814016/899742435.py:5: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  final_col_preprocess = col_preprocess.replace("notckd", 0)
In [42]:
# confirm if the shape of the vector and matrix
print(concat_cols.shape)
print(y.shape)
(400, 41)
(400,)
In [43]:
# now that we have both matrices we can see the distribution of the target variable to know what to do next
# when it comes to preprocessing
final_col_preprocess.reset_index()["classification"].value_counts(normalize=True)
Out[43]:
classification
1    0.625
0    0.375
Name: proportion, dtype: float64

Split the dataframe to 3 for more training and compare result with the same configuration set¶

In [44]:
# 3 split to evaluate model performance: do this to evaluate models better
features_train, features_validation_test, labels_train, labels_validation_test = train_test_split(concat_cols_df, y, test_size=0.4, random_state=100)
features_validation, features_test, labels_validation, labels_test = train_test_split( features_validation_test, labels_validation_test, test_size=0.5, random_state=100)
In [45]:
# The patients with chronic kidney disease are more than those who don't have .63 ckd and .38 for nonckd
# The dataset is imbalanced, we can't use the regular accuracy as an evaluation metric instead confusion matrix and F1 score
# we don't want more of the train set in either so I guess stratify is good option
# I changed the 0.5:0.5 to 0.75:0.25
x_train, x_test, y_train, y_test = train_test_split(concat_cols_df, y, test_size = 0.25, stratify = y, random_state=1243)
In [46]:
# Check if the dimensionality is the same for the feature and target set (train)
print("Is the number of rows the same between the features and the target?")
assert x_train.shape[0] == y_train.shape[0]
print (True)
Is the number of rows the same between the features and the target?
True
In [47]:
# Check if the dimensionality is the same for the feature and target set (test)
print("Is the number of rows the same between the features and the target?")
assert x_test.shape[0] == y_test.shape[0]
print (True)
Is the number of rows the same between the features and the target?
True
In [48]:
# Now checking if the target variable is balanced in the train set
pd.Series(y_train).value_counts(normalize=True)
Out[48]:
1    0.623333
0    0.376667
Name: proportion, dtype: float64
In [49]:
# They are still unbalanced now. Therefore, will have to use the f1 score and change the class weight of the algorithms used like logistic regression
In [50]:
# look at the instances of the labels 0 and 1
pd.Series(y_test).value_counts(normalize=True)
Out[50]:
1    0.63
0    0.37
Name: proportion, dtype: float64
In [51]:
# convert all the target variables to integers
y_train = y_train.astype(int)
y_test = y_test.astype(int)
In [52]:
# normal scikit learn paradigm of specify, fit and predict for logistic regression or continuos perceptron
clf_lr1 = LogisticRegression(class_weight="balanced", random_state=1243, max_iter=1000)

clf_lr1.fit(x_train,y_train)

preds1 = clf_lr1.predict(x_test)

# using f1 score instead of other metrics
score_vote1 = f1_score(preds1, y_test)
print('F1-Score: {:.3f}'.format(score_vote1))

# Calculate the classification report
report1 = classification_report(y_test, preds1,target_names=["notckd", "ckd"])
print(report1)
F1-Score: 0.992
              precision    recall  f1-score   support

      notckd       0.97      1.00      0.99        37
         ckd       1.00      0.98      0.99        63

    accuracy                           0.99       100
   macro avg       0.99      0.99      0.99       100
weighted avg       0.99      0.99      0.99       100

In [53]:
# Specify the Decision tree classifier: asks a bunch of if-else statements to come up with a decision
# adjust the number of min samples leaf based on the game 20 questions (From deep learning for coders by Jeremy Howard)
clf_dt2 = DecisionTreeClassifier(class_weight = "balanced",random_state=1243)

clf_dt2.fit(x_train,y_train)

preds2 = clf_dt2.predict(x_test)

score_vote2 = f1_score(preds2, y_test)
print('F1-Score: {:.3f}'.format(score_vote2))

# Calculate the classification report
report2 = classification_report(y_test, preds2, target_names=["notckd", "ckd"])
print(report2)
F1-Score: 0.959
              precision    recall  f1-score   support

      notckd       0.88      1.00      0.94        37
         ckd       1.00      0.92      0.96        63

    accuracy                           0.95       100
   macro avg       0.94      0.96      0.95       100
weighted avg       0.96      0.95      0.95       100

In [54]:
# check the parameters of the decision tree classifier in new lines to see what you can change
clf_dt2.get_params().keys()
Out[54]:
dict_keys(['ccp_alpha', 'class_weight', 'criterion', 'max_depth', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', 'monotonic_cst', 'random_state', 'splitter'])
In [55]:
# normalize data functions
# log, z scores(standardized_data), dimensionality reduction with PCA(dim_reduction) and making a function that combines another function(compose2)
def skew (data):
    skewed_data = np.log(data)
    return skewed_data


def standardized_data(data):
    scaler = StandardScaler()
    scaler.fit(data)
    scaled_data = scaler.transform(data)
    return scaled_data

def dim_reduction(data):
    pca = PCA(n_components=2)
    return pca.fit_transform(data)


def compose2(f, g):
    return lambda *a, **kw: f(g(*a, **kw))

def compose(*fs):
    return reduce(compose2, fs)


# returns an error due to the points being too small
normalize_data = compose2(skew, standardized_data)

# mean of 0 and a std 1 for all the columns
scaled_x_train = standardized_data(x_train)
scaled_x_test = standardized_data(x_test)

# add for the smaller datasets
scaled_features_validation = standardized_data(features_validation)
scaled_features_test = standardized_data(features_test)

# reduce dimensionality to 2
transform_data = compose2(standardized_data, dim_reduction)
dim_red_x_train = transform_data(x_train)
dim_red_x_test = transform_data(x_test)
In [56]:
# Logistic regression just a modified perceptron algorithm that uses the sigmoid function therefore an continous perceptron
clf_lr3 = LogisticRegression(class_weight="balanced",random_state=1243)

clf_lr3.fit(scaled_x_train,y_train)

preds3 = clf_lr3.predict(scaled_x_test)

score_vote3 = f1_score(preds3, y_test)
print('F1-Score: {:.3f}'.format(score_vote3))

# Calculate the classification report
report3= classification_report(y_test, preds3,target_names=["notckd", "ckd"])
print(report3)
F1-Score: 0.976
              precision    recall  f1-score   support

      notckd       0.93      1.00      0.96        37
         ckd       1.00      0.95      0.98        63

    accuracy                           0.97       100
   macro avg       0.96      0.98      0.97       100
weighted avg       0.97      0.97      0.97       100

In [57]:
# Logistic regression but the features have been compressed
clf_lr5 = LogisticRegression(class_weight="balanced",random_state=1243)

clf_lr5.fit(dim_red_x_train,y_train)

preds5 = clf_lr5.predict(dim_red_x_test)

score_vote5 = f1_score(preds5, y_test)
print('F1-Score: {:.3f}'.format(score_vote5))

# Make a classification report
report5 = classification_report(y_test, preds5,target_names=["notckd", "ckd"])
print(report5)
F1-Score: 0.603
              precision    recall  f1-score   support

      notckd       0.40      0.51      0.45        37
         ckd       0.66      0.56      0.60        63

    accuracy                           0.54       100
   macro avg       0.53      0.53      0.53       100
weighted avg       0.57      0.54      0.55       100

In [58]:
# Decision tree classifier but with scaled features
clf_dt4 = DecisionTreeClassifier(class_weight="balanced", random_state=1243, min_samples_leaf=25)

clf_dt4.fit(scaled_x_train,y_train)

preds4 = clf_dt4.predict(scaled_x_test)

score_vote4 = f1_score(preds4, y_test)
print('F1-Score: {:.3f}'.format(score_vote4))

# Make a classification report
report4 = classification_report(y_test, preds4, target_names=["notckd", "ckd"],)
print(report4)
F1-Score: 0.879
              precision    recall  f1-score   support

      notckd       0.74      0.95      0.83        37
         ckd       0.96      0.81      0.88        63

    accuracy                           0.86       100
   macro avg       0.85      0.88      0.86       100
weighted avg       0.88      0.86      0.86       100

In [59]:
# take the coefficient and see the dimension
clf_lr1.coef_.shape
Out[59]:
(1, 41)

To do: Tests for overfitting¶

In [ ]:
# helps with visualizing the decision function for the classifier
def plot_points(features, labels):
    '''

    '''
    X = np.array(features) # convert data into an numpy array: features
    y = np.array(labels) # convert data into an numpy array: labels
    ckd = X[np.argwhere(y==1)] # get all instances where the features are for individuals with ckd
    notckd = X[np.argwhere(y==0)] # get all instances where the features are for individuals without ckd
    plt.scatter([s[0][0] for s in ckd],
                [s[0][1] for s in ckd],
                s = 30,
                color = 'cyan',
                edgecolor = 'k',
                marker = '^')
    plt.scatter([s[0][0] for s in notckd],
                [s[0][1] for s in notckd],
                s = 30,
                color = 'red',
                edgecolor = 'k',
                marker = 's')
    plt.xlabel('aack')
    plt.ylabel('beep')
    plt.legend(['ckd','notckd'])
def draw_line(a,b,c, color='black', linewidth=2.0, linestyle='solid', starting=0, ending=3):
    # Plotting the line ax + by + c = 0
    x = np.linspace(starting, ending, 1000)
    plt.plot(x, -c/b - a*x/b, linestyle=linestyle, color=color, linewidth=linewidth)
In [61]:
# Trying to visualize the function but this didn't work so well
X = np.array(concat_cols)
y = np.array(y)
ckd = X[np.argwhere(y==0)]
notckd = X[np.argwhere(y==1)]

plt.scatter([s[0][0] for s in ckd],
                [s[0][1] for s in ckd],
                s = 25,
                color = 'cyan',
                edgecolor = 'k',
                marker = '^')
plt.scatter([s[0][0] for s in notckd],
                [s[0][1] for s in notckd],
                s = 25,
                color = 'red',
                edgecolor = 'k',
                marker = 's')
plt.xlabel('ckd')
plt.ylabel('notckd')
plt.legend(['ckd','notckd'])
Out[61]:
<matplotlib.legend.Legend at 0x7576f7874430>
In [62]:
# This needs some fixing: Please ignore this for now.
plot_points(scaled_x_train, y_train)
draw_line(1,1, clf_lr1.fit_intercept)
In [63]:
# Check this out https://github.com/luisguiserrano/manning/blob/master/Chapter%205%20-%20Logistic%20Regression/Coding%20the%20Logistic%20Regression%20Algorithm.ipynb
In [60]:
%matplotlib inline
In [61]:
# plotting feature importance for the Decision tree
# grab the column names as a list

features = concat_cols_df.columns

# get the feature importances
important_features = clf_dt2.feature_importances_

# find the indices of a sorted array
feature_indices = np.argsort(important_features)

# make a plot
plt.title('Feature Importances Decision Tree')
plt.xticks(fontsize=6, rotation = 45)
plt.barh(range(len(feature_indices)), important_features[feature_indices], color='g', align='center')
plt.yticks(range(len(feature_indices)), [features[i] for i in feature_indices], fontsize = 6)
plt.xlabel('Relative Importance')
plt.show()
No description has been provided for this image
In [62]:
# Reviewing feature importance using the logistic regression and the C parameter
# grab the coefficients and transpose the array
# label the C parameter
plt.plot(np.sort(clf_lr1.coef_.T), 'o', label="C=1",color = "g")
plt.xticks(range(concat_cols_df.shape[1]), concat_cols_df.columns, rotation=90)
plt.hlines(0, 0, concat_cols_df.shape[1])
plt.title("Examination of feature importance")
plt.xlabel("Coefficient index")
plt.ylabel("Coefficient magnitude")
plt.legend()
Out[62]:
<matplotlib.legend.Legend at 0x73368de40610>
No description has been provided for this image
In [54]:
# Draw a feature importance plot for the logistic regression in the same way
plt.title('Feature Importances Logistic Regression')
plt.xticks(fontsize=6, rotation = 45)
plt.barh(range(len(feature_indices)), clf_lr1.coef_[0][feature_indices], color='g', align='center')
plt.yticks(range(len(feature_indices)), [features[i] for i in feature_indices], fontsize = 6)
plt.xlabel('Relative Importance')
plt.show()
No description has been provided for this image
In [67]:
#!pip install pydotplus -q
In [63]:
# draw the decision tree
# add more comments for this
import six
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus

dot_data = six.StringIO()
export_graphviz(clf_dt2, out_file=dot_data,
                filled=True, rounded=True,
                special_characters=True, feature_names = concat_cols_df.columns, class_names =["notckd", "ckd"])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

# look at hemo(hemoglobin), sg(specific gravity), al(albumin), sod(sodium), rbc=normal(red blood cells), htn=yes(hypertension), bu(blood urea)
# dm (diabetes mellitus)
Out[63]:
No description has been provided for this image

Let's see the plot abit differently in markdown format to help us export this work to a report or maybe attach it to Data Version Control with less effort.

In [64]:
from sklearn.tree import export_text

rules = export_text(clf_dt2, feature_names=list(concat_cols_df.columns))

print(rules)
|--- hemo <= 12.85
|   |--- sod <= 143.50
|   |   |--- sg <= 1.02
|   |   |   |--- class: 1
|   |   |--- sg >  1.02
|   |   |   |--- hemo <= 2.90
|   |   |   |   |--- rbc=normal <= 0.50
|   |   |   |   |   |--- pcc=present <= 0.50
|   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |--- pcc=present >  0.50
|   |   |   |   |   |   |--- class: 1
|   |   |   |   |--- rbc=normal >  0.50
|   |   |   |   |   |--- class: 0
|   |   |   |--- hemo >  2.90
|   |   |   |   |--- class: 1
|   |--- sod >  143.50
|   |   |--- age <= 43.00
|   |   |   |--- class: 0
|   |   |--- age >  43.00
|   |   |   |--- pot <= 4.35
|   |   |   |   |--- class: 1
|   |   |   |--- pot >  4.35
|   |   |   |   |--- class: 1
|--- hemo >  12.85
|   |--- sg <= 1.02
|   |   |--- sg <= 0.50
|   |   |   |--- htn=yes <= 0.50
|   |   |   |   |--- class: 0
|   |   |   |--- htn=yes >  0.50
|   |   |   |   |--- class: 1
|   |   |--- sg >  0.50
|   |   |   |--- class: 1
|   |--- sg >  1.02
|   |   |--- dm=yes <= 0.50
|   |   |   |--- class: 0
|   |   |--- dm=yes >  0.50
|   |   |   |--- class: 1

Here we see the decision tree in another representation that could save memory especially if working with data version control systems. It is like running a profiler on the data. We can see that the decision tree is not overfitting. The model is able to classify the data well.

In [57]:
# pertubation test
# this is a test to see how robust the model is
from sklearn.inspection import permutation_importance

# this is a test to see how robust the model is
result = permutation_importance(clf_dt2, x_test, y_test, n_repeats=10, random_state=1243)

sorted_idx = result.importances_mean.argsort()
plt.barh(range(len(sorted_idx)), result.importances_mean[sorted_idx], color='g', align='center')
plt.yticks(range(len(sorted_idx)), [features[i] for i in sorted_idx], fontsize = 6)
plt.xlabel('Relative Importance')
plt.title('Permutation Importances Decision Tree')
Out[57]:
Text(0.5, 1.0, 'Permutation Importances Decision Tree')
No description has been provided for this image

With the permutation test we are trying to find out we are trying to add noise to the data by randomly shuffling the data while keeping others constant. This helps us evaluate if we will have the same features as importance. Think of it as doing the experiment multiple times. The top features are retained: hemo (hemogloblin), dm(diabetes mellitus) and sp (specific gravity). The rbc=normal seems to be downregulating the prediction and hence, it is safe to remove it. However, maybe the dataset shuffle might have affected the results. Now, lets estimate the shap values.

In [ ]:
# using shap values to compare the feature importance
# if they are stable

import shap
from sklearn.metrics.pairwise import cosine_similarity

# calculate shap values for logistic regression
# Wrap the model's predict_proba method in a callable function
def model_predict_proba(X):
	return clf_lr1.predict_proba(X)

# Create the SHAP explainer using the callable function
explainer = shap.Explainer(model_predict_proba, x_train)
shap_values = explainer(x_train)

# summarize the effects of all the features (reduce to 2D by averaging over the class dimension)
feature_imp_mod1 = np.mean(np.abs(shap_values.values).sum(axis=1), axis=0)


# calculate shap values for decision tree
explainer2 = shap.Explainer(clf_dt2)
shap_values2 = explainer2.shap_values(x_train)
# summarize the effects of all the features (reduce to 2D by averaging over the class dimension)
feature_imp_mod2 = np.mean(np.abs(shap_values2).sum(axis=1), axis=0)

# consistency calculation
consistency = cosine_similarity([feature_imp_mod1], [feature_imp_mod2])

print("The consistency of the two models is: ", consistency)
The consistency of the two models is:  [[1.]]

Shap values help us understand exactly how the model is making predictions. It helps us understand the features that are important in making the prediction. We have attempted to compare both model predictions and generated shap values. The cosine similiarity between the the logistic regression and the decision tree are 1 this means the features being taken into consideration are the same. In addition, the features have robust predictive power regardless of model choice. Which is an unusual result.

In [68]:
# Calculating model faitfulness by changing features and seeing if predictions change
try_pred = clf_dt2.predict(x_test)

# check the predictions
print("Original predictions: ", try_pred)

# changing the features on hemo comparing versus original
# ranges between 3.1 to 17.8 decisively 12.85
x_test2 = x_test.copy()

x_test2.iloc[:, 0] = 12.85
# check the predictions
preds2 = clf_dt2.predict(x_test2)

# compare the predictions with the actual y_test values
# Calculate cosine similarity between original predictions and actual y_test
similarity_original = cosine_similarity([try_pred], [y_test])

# Calculate cosine similarity between modified predictions and actual y_test
similarity_modified = cosine_similarity([preds2], [y_test])

print("Cosine similarity (original predictions vs actual): ", similarity_original[0][0])
print("Cosine similarity (modified predictions vs actual): ", similarity_modified[0][0])
Original predictions:  [1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 0 0 0 0 1 1
 0 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1
 1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0]
Cosine similarity (original predictions vs actual):  0.9594972228385658
Cosine similarity (modified predictions vs actual):  0.9428090415820634
/tmp/ipykernel_814016/3853791278.py:11: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '12.85' has dtype incompatible with float32, please explicitly cast to a compatible dtype first.
  x_test2.iloc[:, 0] = 12.85

Even trying to change the features of the model. we are still getting more or less the same prediction. This model is not overfitting and indeed, has robust predictive power. Now, we try and interpret the data based on medical knowledge we have:

  1. At the start if the hemoglobin levels were less than or equal to 12.85 on average the samples 300 -- this is every sample of the dataset is split to the left side and hemoglobin levels are less than 12.85 then the level of specific gravity is considered. These features are both looked into the URI strip test done in hospitals can be done at home too.

  2. To the left we now see sodium on average less than or equal to 143.5 being considered. Moving along we see that sodium and specific gravity being considered. This also makes sense especially hemoglobin and specific gravity which are part of the tests done on the kidney function test in a URI strip. Sodium and Potassium are reabsorbed mostly by the kidney but here some is seen maybe they are some other conditions the patient is suffering from. Age less than or equal to 43.2 on average for some patients did not have chronic kidney disease whereas, three had it and their age surpassed 43 to a small extent age can be considered as determining factor. Staying on the left side of the tree we see more patients being classified as having chronic disease as an example after the division by specific gravity 141 samples were placed there without any further subdivision -- 141 in number. On the other hand, for those whose hemoglobin levels mean were greater than to 2.9, 21 samples whose samples but the there's something interesting going on since the gini index (measure of diversity in a set) is negative, normally 0 means that the same is pure and you'll find only those with chronic kidney disease here. But on the other side, where rbc=normal, the condition is true that is, the patients have less rbc=normal and the hemoglobin is less than 2.9, we see that 4 patients were placed in group of having chronic kidney disease and the other 2 didn't have it completing the left side of the tree.

  3. Let's move to the False subdivision where the hemoglobin for patients in this group was more than 12.85 at the root of the tree. In this group, the specific gravity is still being considered whose mean value 1.017 -- this is normal range if using a URI strip test. But later there's a subdivision less than 0.505, this is very low beyond the normal threshold and as you can see 16 patients were classified as having chronic kidney disease. Another branch looks at if the patient has diabetes mellitus (pancreas cells don't release enough insulin or insulin resistance) less than half of the patients, 5 were classified as not having chronic kidney disease and one had chronic kidney disease. Normally a consequence of diabetes mellitus is chronic kidney disease due to the predisposing symptoms of diabetes mellitus.

  4. In the last branch, to the further right we see samples moving from a specific gravity here the specific gravity is greater than 1.017 this is also beyond in the normal threshold then the dm=yes where there are 105 samples, the specific gravity was in the normal range to my knowledge and 104 samples were classified as not having chronic kidney disease and lastly one had chronic kidney disease in the sample.

  5. Most of the leaf nodes have a gini index of 0.0 this means the elements in the sample had one of that class in the leaf node this shows that some of the features selected by this decision tree could be really good features to be looked into during diagnosis or progression of a disease to know if the patients could have chronic, acute kidney disease as well as looking at other features like we'll look into when looking at the Logisitic regression classifier. However, the gini index -0.0 is still worrisome because it doesn't make a lot of sense negative 0? it could be a bug in the decision tree implementation or something else. You can find out what's up with that? trying a random forest would be interesting too because the feature dm=yes which means the patient was the group with diabetes mellitus or not seems to be reemerging at some points of the tree.

In [69]:
# interpreting the logistic regression model
clf_lr1.predict(x_test[:1])
Out[69]:
array([1])
In [70]:
# checking out the intercept this means that if every feature is 0 what the prediction would be
clf_lr1.intercept_
Out[70]:
array([5.76320108])
In [71]:
# checking out the coefficients we'll multiply this by the every value that corresponds to that feature e.g clf_lr1.coef[0] * age[0]
clf_lr1.coef_
Out[71]:
array([[ 0.0063346 , -0.01842495,  0.22342211,  1.32314306,  0.32025324,
         0.01268437, -0.00690186,  1.32353171, -0.02486324, -0.20097746,
        -0.23277008, -0.0221892 , -0.16596807,  0.19124526, -0.26965719,
        -0.0221892 ,  0.29493439, -0.01686371, -0.13511728,  0.15506898,
        -0.02987156, -0.06799412,  0.10095367, -0.02987156, -0.88891432,
         0.92187387, -0.02987156, -0.62832756,  0.66128711,  0.55442474,
        -0.44919695, -0.1021398 , -0.01686371,  0.00382733,  0.01612436,
        -0.0221892 , -0.38755198,  0.41282917,  0.08652115,  1.7289188 ,
        -1.81235197]])
In [72]:
# checking out the available columns
concat_cols_df.columns
Out[72]:
Index(['age', 'bp', 'sg', 'al', 'su', 'bgr', 'bu', 'sc', 'sod', 'pot', 'hemo',
       'ane=miss', 'ane=no', 'ane=yes', 'appet=good', 'appet=miss',
       'appet=poor', 'ba=miss', 'ba=notpresent', 'ba=present', 'cad=miss',
       'cad=no', 'cad=yes', 'dm=miss', 'dm=no', 'dm=yes', 'htn=miss', 'htn=no',
       'htn=yes', 'pc=abnormal', 'pc=miss', 'pc=normal', 'pcc=miss',
       'pcc=notpresent', 'pcc=present', 'pe=miss', 'pe=no', 'pe=yes',
       'rbc=abnormal', 'rbc=miss', 'rbc=normal'],
      dtype='object')
In [ ]:
# make a dataframe to help easily grab the coefficients for writing the formula and visualizing the data
# transpose to see it clearly
important_features2 = clf_lr1.coef_[0]
column_coef = pd.DataFrame(list(zip(important_features2.T.ravel("C").tolist(), features)),columns = ["coefficient", "feature"])
column_coef["coefficient"] = column_coef["coefficient"].astype("float32")
column_coef.T
Out[ ]:
0 1 2 3 4 5 6 7 8 9 ... 31 32 33 34 35 36 37 38 39 40
coefficient 0.006335 -0.018425 0.223422 1.323143 0.320253 0.012684 -0.006902 1.323532 -0.024863 -0.200977 ... -0.10214 -0.016864 0.003827 0.016124 -0.022189 -0.387552 0.412829 0.086521 1.728919 -1.812352
feature age bp sg al su bgr bu sc sod pot ... pc=normal pcc=miss pcc=notpresent pcc=present pe=miss pe=no pe=yes rbc=abnormal rbc=miss rbc=normal

2 rows × 41 columns

In [74]:
# arrange the coeffiecients in descending order to know the most likely features
column_coef.sort_values(by=["coefficient"], axis = 0, inplace=True, ascending=False)
print(column_coef.head(10))
    coefficient      feature
39     1.728919     rbc=miss
7      1.323532           sc
3      1.323143           al
25     0.921874       dm=yes
28     0.661287      htn=yes
29     0.554425  pc=abnormal
37     0.412829       pe=yes
4      0.320253           su
16     0.294934   appet=poor
2      0.223422           sg

missing rbcs was a most important feature. I will not take that since it mostly has missing values in fact 152 sample patient results are missing. I think the best course of action is to get the data from the patients about RBCs or drop the column entirely for modelling -- there's an issue with that rationale since some patients may not come back or can't afford to do the test. al - albumin levels, sc - serum creatinine, dm=yes the patient having diabetes mellitus and htn - hypertension these are all crucial kidney function tests or predisposing features of a patient that could be having chronic kidney disease based on my background. The additional features that have been highlighted are also important but have lower values as compared to the ones mentioned prior.

In a more technical perspective these features make sense for instance the albumin which is a large protein that is not supposed to pass through as a glomerular filtrate in the proximal convoluted tubule to the urine since the patient could have a high blood pressure yet another predisposing feature that could fuel kidney diseases either acutely or chronicly to the patients being classified as having chronic kidney disease at least for those features. How does hypertension do it? If the blood pressure is high according to the anatomy of the kidney we'll see a faster rate of filtration and the pressure may damage the glomerulus. Imagine trying to use a sieve with a fast flowing liquid and particules just a bit larger than the sieve. Over a long period of time some of those particles may pass through.

In [ ]:
x_test[:1] # see all the features in the first column
Out[ ]:
age bp sg al su bgr bu sc sod pot ... pc=normal pcc=miss pcc=notpresent pcc=present pe=miss pe=no pe=yes rbc=abnormal rbc=miss rbc=normal
147 60.0 60.0 1.01 3.0 1.0 288.0 36.0 1.7 130.0 3.0 ... 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0

1 rows × 41 columns

In [76]:
x_test.shape # see the number of rows and columns
Out[76]:
(100, 41)
In [77]:
# writing the logistic regression formula
# Bo intercept, B1 x1n
# writing the denominator based on the wikipedia entry https://en.wikipedia.org/wiki/Logistic_regression
# 1/ np.exp(-(weight * coefn + clf_lr1.intercept_))
weights_int_bias = clf_lr1.intercept_ + (column_coef.coefficient[0] * 60.0) + (column_coef.coefficient[1] * 60.0) + (column_coef.coefficient[2] * 1.01) + (column_coef.coefficient[3] + 3.0) + (column_coef.coefficient[4] + 1.0) + (column_coef.coefficient[5] * 288) + (column_coef.coefficient[6] * 36.0) + (column_coef.coefficient[7] * 1.7) + (column_coef.coefficient[8] * 130) + (column_coef.coefficient[9] * 3.0) + (column_coef.coefficient[10] * 7.9) + (column_coef.coefficient[11] * 0.0) + (column_coef.coefficient[12] * 0.0) + (column_coef.coefficient[13] * 1.0) + (column_coef.coefficient[14] * 0.0) + (column_coef.coefficient[15] * 0.0) + (column_coef.coefficient[16] * 1.0) + (column_coef.coefficient[17] * 0.0) + (column_coef.coefficient[18] * 1.0) + (column_coef.coefficient[19] * 0.0) + (column_coef.coefficient[20] * 0.0) + (column_coef.coefficient[21] * 1.0) + (column_coef.coefficient[22] * 0.0) + (column_coef.coefficient[23] * 0.0) + (column_coef.coefficient[24] * 1.0) + (column_coef.coefficient[25] * 0.0) +  (column_coef.coefficient[26] * 0.0) + (column_coef.coefficient[27] * 1.0) + (column_coef.coefficient[28] * 1.0) + (column_coef.coefficient[29] * 0.0) + (column_coef.coefficient[30] * 0.0) + (column_coef.coefficient[31] * 0.0) + (column_coef.coefficient[32] * 0.0) + (column_coef.coefficient[33] * 1.0) + (column_coef.coefficient[34] * 0.0) + (column_coef.coefficient[35] * 0.0) + (column_coef.coefficient[36] * 0.0) + (column_coef.coefficient[37] * 0.0) + (column_coef.coefficient[38] * 0.0) + (column_coef.coefficient[39] * 1.0)
In [78]:
# add the sigmoid function to make the decision
# One way to make the loss function
def sigmoid(x):
    return np.exp(x)/(1+np.exp(x))


print(sigmoid(weights_int_bias))
[0.99999414]
In [79]:
# according to wikipedia implementation: sigmoid function
1 / (1 + np.exp(-weights_int_bias))
Out[79]:
array([0.99999414])

Get a single row of features and add it onto the model above and confirm if you get the same result as above.

In [80]:
clf_lr1.predict(x_test[:1]) # has chronic kidney disease for the 147th id
Out[80]:
array([1])

Conclusion:¶

I think I was onto something because if you review the feature importance the hemoglobin, specific gravity(sg), sodium(sod), age were the most important features. Others that are related: al(albumin), sc(serum creatinine) which are medically relevant as well. I've done some tests with a uri strip to figure out if someone has an issue with their kidney and these are the parameters that point out to dysfunction of the kidney. Others like having diabetes mellitus though this is related to pancreatic beta cells issues, sodium levels taken up again by the kidney and age could also be indicators. The decision tree is also interesting but could use some tuning. The intercept says that on default the patient doesn't have chronic disease as well but the decision tree the start was hemoglobin. In future I will update the plots to show how the decision was made better. Otherwise, I'd discuss these results with a medical practitioner like a urologist. The explainable machine techniques further gives us confidence that the models have robust predictive power as we explain the results to the medical practitioner. The reader can try using a random forest classifier to see if the results are similar. What do you think?